Showing posts with label LHCb. Show all posts
Showing posts with label LHCb. Show all posts

03 April 2017

Current storage scope within GridPP

With a new project year upon us, I decided to review which site's storage support is used used by the WLCG VOs, what SRMs are used and the file systems used. On that last part , we now don't just have filesystems but also object stores with the usage of CEPH. Other filesystems are XFS, ZFS, HDFS, Spectrum Scale (or the artist formerly known as GPFS), and Lustre.

In terms of Storage elements/systems we have DPM, dCache, Castor, classic SE, stand alone xrootd, and stand alone gsiftp services. When it comes to the regional T2s and who they are used by, the following helps.


I thought about embedding SE system into the font used for each site, but thought that was too much overlay of information.

20 April 2016

ZFS compression for LHC experiments data

An interesting feature of ZFS is that it supports transparent compression. Different to typical file compression, ZFS compression works on the record size/block size that it writes (which is variable in ZFS depending on the data and file size itself). Since it is important to have a fast compression/decompression algorithm to reduce the overhead compared to file access without compression, it can not be expected to get compression results similar to for example bzip in its highest compression level.  Also, the data files of the LHC experiments are ROOT files which already store data in a compressed format.

Therefore, I was not expecting any benefit of enabling compression on our servers, but since the newly implemented algorithm LZ4 has nearly no overhead even for non-compressible data, it shouldn't hurt to enable it.  Especially since our storage servers have Dual-CPUs with 12 cores each, running most of the time idle.

After enabling the default lz4 compression on 4 machines that were already migrated to ZFS and copying data on it, the first compression result looks like this:


NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank-2TB  32.5T  8.73T  23.8T         -    15%    26%  1.00x  ONLINE  -
tank-8TB   116T  24.0T  92.0T         -    10%    20%  1.00x  ONLINE  -

NAME                    PROPERTY       VALUE  SOURCE
tank-2TB                compressratio  1.03x  -
tank-2TB/gridstorage01  compressratio  1.03x  -
tank-2TB/gridstorage02  compressratio  1.03x  -
tank-2TB/gridstorage03  compressratio  1.03x  -
tank-2TB/gridstorage04  compressratio  1.03x  -
tank-8TB                compressratio  1.03x  -
tank-8TB/gridstorage01  compressratio  1.03x  -
tank-8TB/gridstorage02  compressratio  1.03x  -
tank-8TB/gridstorage03  compressratio  1.03x  -
tank-8TB/gridstorage04  compressratio  1.03x  -
tank-8TB/gridstorage05  compressratio  1.04x  -
tank-8TB/gridstorage06  compressratio  1.03x  -
tank-8TB/gridstorage07  compressratio  1.03x  -
tank-8TB/gridstorage08  compressratio  1.03x  -
tank-8TB/gridstorage09  compressratio  1.03x  -
tank-8TB/gridstorage10  compressratio  1.03x  -
tank-8TB/gridstorage11  compressratio  1.03x  -



NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank-2TB  32.5T  8.45T  24.0T         -    11%    26%  1.00x  ONLINE  -
tank-8TB   116T  24.1T  91.9T         -     7%    20%  1.00x  ONLINE  -

NAME                    PROPERTY       VALUE  SOURCE
tank-2TB                compressratio  1.03x  -
tank-2TB/gridstorage01  compressratio  1.03x  -
tank-2TB/gridstorage02  compressratio  1.03x  -
tank-2TB/gridstorage03  compressratio  1.03x  -
tank-2TB/gridstorage04  compressratio  1.04x  -
tank-8TB                compressratio  1.03x  -
tank-8TB/gridstorage01  compressratio  1.03x  -
tank-8TB/gridstorage02  compressratio  1.03x  -
tank-8TB/gridstorage03  compressratio  1.03x  -
tank-8TB/gridstorage04  compressratio  1.03x  -
tank-8TB/gridstorage05  compressratio  1.03x  -
tank-8TB/gridstorage06  compressratio  1.03x  -
tank-8TB/gridstorage07  compressratio  1.03x  -
tank-8TB/gridstorage08  compressratio  1.03x  -
tank-8TB/gridstorage09  compressratio  1.03x  -
tank-8TB/gridstorage10  compressratio  1.03x  -
tank-8TB/gridstorage11  compressratio  1.03x  -


NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank-4TB   127T  9.05T   118T         -     3%     7%  1.00x  ONLINE  -

NAME                    PROPERTY       VALUE  SOURCE
tank-4TB                compressratio  1.03x  -
tank-4TB/gridstorage01  compressratio  1.03x  -
tank-4TB/gridstorage02  compressratio  1.03x  -
tank-4TB/gridstorage03  compressratio  1.04x  -
tank-4TB/gridstorage04  compressratio  1.02x  -
tank-4TB/gridstorage05  compressratio  1.03x  -
tank-4TB/gridstorage06  compressratio  1.03x  -
tank-4TB/gridstorage07  compressratio  1.03x  -
tank-4TB/gridstorage08  compressratio  1.03x  -
tank-4TB/gridstorage09  compressratio  1.03x  -
tank-4TB/gridstorage10  compressratio  1.04x  -
tank-4TB/gridstorage11  compressratio  1.03x  -
tank-4TB/gridstorage12  compressratio  1.02x  -
tank-4TB/gridstorage13  compressratio  1.03x  -
tank-4TB/gridstorage14  compressratio  1.03x  -



NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank-2TB  63.5T  15.4T  48.1T         -    11%    24%  1.00x  ONLINE  -

NAME                    PROPERTY       VALUE  SOURCE
tank-2TB                compressratio  1.03x  -
tank-2TB/gridstorage01  compressratio  1.03x  -
tank-2TB/gridstorage02  compressratio  1.04x  -
tank-2TB/gridstorage03  compressratio  1.03x  -
tank-2TB/gridstorage04  compressratio  1.03x  -
tank-2TB/gridstorage05  compressratio  1.03x  -
tank-2TB/gridstorage06  compressratio  1.03x  -
tank-2TB/gridstorage07  compressratio  1.03x  -


Although there is not much data stored so far on each of the machines, this means we can still reduce the used disk space by some percent, 2-4% here depending on the file system and the data on it.
We have a bit more than 1PB disk storage in total on our site and the servers with 2TB disks provide about 50TB usable storage each. If we can get 4% compression for all the data, that would mean we could get nearly the space provided by one of the 2TB-disk servers additionally for free, without the cost of a new machine, power, extra disks,.... ! And that's just with the default compression while the compression level could also be tuned in ZFS...
This saving could be even bigger if we consider that in the future sites will also store more non-LHC data, like for LSST, which use a different and maybe uncompressed file format.
Another positive aspect of compression is that it reduces disk I/O since it needs to read less data blocks from disk.

It will be interesting to see how the compression rate will be after all our servers have been switch over  to ZFS.





21 August 2014

Updated data models from experiments

At the GridPP meeting in Ambleside ATLAS announced having lifetime on their files: not quite like the SRM implementation where a file could have a finite when created, but more like a timer which counts after each access. Unlike SRM, deletion when the file has been not accessed for the set length of time, the file will be automatically deleted. Also notable is that files can now belong to multiple datasets, and they are set with automatic replication policies (well, basically how many replicas at T1s are required.) Now with extra AOD visualisation goodness.

Also interesting updates from LHCb, they are continuing to use SRM to stage files from tape, but could be looking into FTS3 for this. Also discussed the DIRAC integrity checking with Sam over breakfast. In order to confuse the enemy they are not using their own GIT but code from various places: both LHCb and DIRAC have their own repositories, and some code is marked as "abandonware," so determining which code is being used in practice requires asking. This correspondent would have naïvely assumed that whatever comes out of git is being used... perhaps that just for high energy physics...

CMS to speak later.

31 March 2014

Highlights of ISGC 2014

ISGC 2014 is over. Lots of interesting discussions - on the infrastructure end, ASGC developing fanless machine room, interest in (and results on) CEPH and GLUSTER, dCache tutorial, and an hour of code with the DIRAC tutorial.

All countries and regions presented overviews of their work in e-/cyber-Infrastructure.

Interestingly, although this wasn't a HEP conference, practically everyone is doing >0 on LHC, so the LHC really is binding countries and researchers (well, at least physicist and infrastructureists) and e-Infrastructures together (and NRENs). When one day, someone sits down to tally up the benefit and impact of the LHC, this ought to be one of the top ones. The ability to work together and to (mostly) be able to move data to each other, and to trust each other's CAs.

Regarding the DIRAC tutorial, I was there and went through as much as I could ("I am not doing that to my private key")  Something to play with a bit more when I have time - an hour (of code) is not much time; there are always compromises between getting stuff done realistically and cheating in tutorials, but as long as there's something you can take away and play with later. As regards the key shenanigans, DIRAC say they will be working with EGI on SSO, so that's promising. Got the T-shirt, too. "Interware," though?

On the security side, OSG have been interfacing to DigiCert, following the planned termination of the ESNET CA. Once again grids have demands that are not seen in the commercial world, such as the need for bulk certificates (particularly cost effective ones - something a traditional Classic IGTF can do fairly well.) Other security questions (techie acronym alert, until end of paragraph) include how Argus and XACML compare for implementing security policies, and the EMI STS - CERN looking at linking with ADFS. And Malaysia are trialling an online CA based on a FIPS level three token with a Raspberry Ï€.

EGI federated cloud got mentioned quite a few times - KISTI interested in offering IaaS, also Australia interested in joining. Philippines providing resources. EGI have a strategy for engagement. Interesting the extent to which they are driving the of CDMI.

I should mention Shaun gave a talk on "federated" access to data, comparing the protocols - which I missed - the talk, I mean - being in another session, but I understand it was well received and there was a lot of interest.

Software development - interesting experiences from the dCache team and building user communities with (for) DIRAC. How are people taught to develop code? The closing session was by Adam Lyon from Fermilab who talked about the lessons learned - the HEP vision of big data being different from the industry one. And yet HEP needs a culture shift to move away from the not-invented-here.

ISGC really had a great mix of Asian and European countries, as well as the US and Australia. This post was just a quick look through my notes; there'll be much more to pick up and ponder over the coming months. And I haven't even mentioned the actual science stuff ...

24 March 2008

Grid storage not working?

Well, going by what I heard last week at LHCb software week, I think the answer to this question is "No". The majority of the week focussed on all the cool new changes to the core LHCb software and improvements to the HLT, but there was an interesting session on Wednesday afternoon covering CCRC and more general LHCb computing operations. The point was made in 3 (yes, 3!) separate talks that LHCb continue to be plagued with storage problems which prevent their production and reconstruction jobs from successfully completing. The main issue is the instability of using local POSIX-like protocols to remotely open files on the grid SE from jobs running on the site WNs. From my understanding, this issue could broadly be separated into two categories:

1. Many of the servers being used have been configured in such a way that if a job held a file in an open state for longer than (say) 1 day, the connection was being dropped, causing the entire job to fail.

2. Sites have been running POSIX-like access serices on the same hosts that are providing the SRM. This isn't wrong, but is definitely not recommended due to the load on the system. Anyway, the real problem comes when the SRM has to be restarted for some reason (most likely an upgrade) and the site(s) appear to have just been restarting all services on the node which again resulted in any open file connections being dropped and jobs subsequently failing. I thought it was basic knowledge that everyone knew about, but apparently I was wrong.

LHCb seem to be particularly vulnerable as they have long running reconstruction jobs (>33 hours),resulting in low job efficiency when the above problems rear their ugly heads. I would be interested in comments from other experiments on these observations. Anyway, the upshot of this is that LHCb are now considering on copying data files locally prior to starting their reconstruction jobs. This won't be possible for user analysis jobs, which will be accessing events from a large number of files. Copying all of these locally isn't all that efficient, nor do you know a priori how much local space the WN has available.

xrootd was also proposed as an alternative solution. Certainly dCache, CASTOR and DPM all now provide an implementation of the xrootd protocol in addition to native dcap/rfio, so getting it deployed at sites would be relatively trivial (some places already have it available for ALICE). I don't know enough about xrootd to comment, but I'm sure if properly configured it would be able to deal with case 1 above. Case 2 is a different matter entirely... It should be noted (perhaps celebrated?) that none of the above problems have to do with SRM2.2.

Of course, LHCb only require disk at Tier-1s, so none of this applies to Tier-2 sites. Also, they reported that they saw no problems at RAL: well done guys!

In addition, the computing team have completed a large part of the stripping that the physics planning group have asked for (but this isn't really storage related).