GridPP storage news: LHCb

Showing posts with label LHCb. Show all posts

03 April 2017

Current storage scope within GridPP

With a new project year upon us, I decided to review which site's storage support is used used by the WLCG VOs, what SRMs are used and the file systems used. On that last part , we now don't just have filesystems but also object stores with the usage of CEPH. Other filesystems are XFS, ZFS, HDFS, Spectrum Scale (or the artist formerly known as GPFS), and Lustre.

In terms of Storage elements/systems we have DPM, dCache, Castor, classic SE, stand alone xrootd, and stand alone gsiftp services. When it comes to the regional T2s and who they are used by, the following helps.

I thought about embedding SE system into the font used for each site, but thought that was too much overlay of information.

20 April 2016

ZFS compression for LHC experiments data

An interesting feature of ZFS is that it supports transparent compression. Different to typical file compression, ZFS compression works on the record size/block size that it writes (which is variable in ZFS depending on the data and file size itself). Since it is important to have a fast compression/decompression algorithm to reduce the overhead compared to file access without compression, it can not be expected to get compression results similar to for example bzip in its highest compression level. Also, the data files of the LHC experiments are ROOT files which already store data in a compressed format.

Therefore, I was not expecting any benefit of enabling compression on our servers, but since the newly implemented algorithm LZ4 has nearly no overhead even for non-compressible data, it shouldn't hurt to enable it. Especially since our storage servers have Dual-CPUs with 12 cores each, running most of the time idle.

After enabling the default lz4 compression on 4 machines that were already migrated to ZFS and copying data on it, the first compression result looks like this:

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank-2TB 32.5T 8.73T 23.8T - 15% 26% 1.00x ONLINE -
tank-8TB 116T 24.0T 92.0T - 10% 20% 1.00x ONLINE -

NAME PROPERTY VALUE SOURCE
tank-2TB compressratio 1.03x -
tank-2TB/gridstorage01 compressratio 1.03x -
tank-2TB/gridstorage02 compressratio 1.03x -
tank-2TB/gridstorage03 compressratio 1.03x -
tank-2TB/gridstorage04 compressratio 1.03x -
tank-8TB compressratio 1.03x -
tank-8TB/gridstorage01 compressratio 1.03x -
tank-8TB/gridstorage02 compressratio 1.03x -
tank-8TB/gridstorage03 compressratio 1.03x -
tank-8TB/gridstorage04 compressratio 1.03x -
tank-8TB/gridstorage05 compressratio 1.04x -
tank-8TB/gridstorage06 compressratio 1.03x -
tank-8TB/gridstorage07 compressratio 1.03x -
tank-8TB/gridstorage08 compressratio 1.03x -
tank-8TB/gridstorage09 compressratio 1.03x -
tank-8TB/gridstorage10 compressratio 1.03x -
tank-8TB/gridstorage11 compressratio 1.03x -

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT

tank-2TB 32.5T 8.45T 24.0T - 11% 26% 1.00x ONLINE -

tank-8TB 116T 24.1T 91.9T - 7% 20% 1.00x ONLINE -

NAME PROPERTY VALUE SOURCE

tank-2TB compressratio 1.03x -

tank-2TB/gridstorage01 compressratio 1.03x -

tank-2TB/gridstorage02 compressratio 1.03x -

tank-2TB/gridstorage03 compressratio 1.03x -

tank-2TB/gridstorage04 compressratio 1.04x -

tank-8TB compressratio 1.03x -

tank-8TB/gridstorage01 compressratio 1.03x -

tank-8TB/gridstorage02 compressratio 1.03x -

tank-8TB/gridstorage03 compressratio 1.03x -

tank-8TB/gridstorage04 compressratio 1.03x -

tank-8TB/gridstorage05 compressratio 1.03x -

tank-8TB/gridstorage06 compressratio 1.03x -

tank-8TB/gridstorage07 compressratio 1.03x -

tank-8TB/gridstorage08 compressratio 1.03x -

tank-8TB/gridstorage09 compressratio 1.03x -

tank-8TB/gridstorage10 compressratio 1.03x -

tank-8TB/gridstorage11 compressratio 1.03x -

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT

tank-4TB 127T 9.05T 118T - 3% 7% 1.00x ONLINE -

NAME PROPERTY VALUE SOURCE

tank-4TB compressratio 1.03x -

tank-4TB/gridstorage01 compressratio 1.03x -

tank-4TB/gridstorage02 compressratio 1.03x -

tank-4TB/gridstorage03 compressratio 1.04x -

tank-4TB/gridstorage04 compressratio 1.02x -

tank-4TB/gridstorage05 compressratio 1.03x -

tank-4TB/gridstorage06 compressratio 1.03x -

tank-4TB/gridstorage07 compressratio 1.03x -

tank-4TB/gridstorage08 compressratio 1.03x -

tank-4TB/gridstorage09 compressratio 1.03x -

tank-4TB/gridstorage10 compressratio 1.04x -

tank-4TB/gridstorage11 compressratio 1.03x -

tank-4TB/gridstorage12 compressratio 1.02x -

tank-4TB/gridstorage13 compressratio 1.03x -

tank-4TB/gridstorage14 compressratio 1.03x -

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT

tank-2TB 63.5T 15.4T 48.1T - 11% 24% 1.00x ONLINE -

NAME PROPERTY VALUE SOURCE

tank-2TB compressratio 1.03x -

tank-2TB/gridstorage01 compressratio 1.03x -

tank-2TB/gridstorage02 compressratio 1.04x -

tank-2TB/gridstorage03 compressratio 1.03x -

tank-2TB/gridstorage04 compressratio 1.03x -

tank-2TB/gridstorage05 compressratio 1.03x -

tank-2TB/gridstorage06 compressratio 1.03x -

tank-2TB/gridstorage07 compressratio 1.03x -

Although there is not much data stored so far on each of the machines, this means we can still reduce the used disk space by some percent, 2-4% here depending on the file system and the data on it.
We have a bit more than 1PB disk storage in total on our site and the servers with 2TB disks provide about 50TB usable storage each. If we can get 4% compression for all the data, that would mean we could get nearly the space provided by one of the 2TB-disk servers additionally for free, without the cost of a new machine, power, extra disks,.... ! And that's just with the default compression while the compression level could also be tuned in ZFS...
This saving could be even bigger if we consider that in the future sites will also store more non-LHC data, like for LSST, which use a different and maybe uncompressed file format.
Another positive aspect of compression is that it reduces disk I/O since it needs to read less data blocks from disk.

It will be interesting to see how the compression rate will be after all our servers have been switch over to ZFS.

21 August 2014

Updated data models from experiments

At the GridPP meeting in Ambleside ATLAS announced having lifetime on their files: not quite like the SRM implementation where a file could have a finite when created, but more like a timer which counts after each access. Unlike SRM, deletion when the file has been not accessed for the set length of time, the file will be automatically deleted. Also notable is that files can now belong to multiple datasets, and they are set with automatic replication policies (well, basically how many replicas at T1s are required.) Now with extra AOD visualisation goodness.

Also interesting updates from LHCb, they are continuing to use SRM to stage files from tape, but could be looking into FTS3 for this. Also discussed the DIRAC integrity checking with Sam over breakfast. In order to confuse the enemy they are not using their own GIT but code from various places: both LHCb and DIRAC have their own repositories, and some code is marked as "abandonware," so determining which code is being used in practice requires asking. This correspondent would have naïvely assumed that whatever comes out of git is being used... perhaps that just for high energy physics...

CMS to speak later.

31 March 2014

Highlights of ISGC 2014

ISGC 2014 is over. Lots of interesting discussions - on the infrastructure end, ASGC developing fanless machine room, interest in (and results on) CEPH and GLUSTER, dCache tutorial, and an hour of code with the DIRAC tutorial.

All countries and regions presented overviews of their work in e-/cyber-Infrastructure.

Interestingly, although this wasn't a HEP conference, practically everyone is doing >0 on LHC, so the LHC really is binding countries and researchers (well, at least physicist and infrastructureists) and e-Infrastructures together (and NRENs). When one day, someone sits down to tally up the benefit and impact of the LHC, this ought to be one of the top ones. The ability to work together and to (mostly) be able to move data to each other, and to trust each other's CAs.

Regarding the DIRAC tutorial, I was there and went through as much as I could ("I am not doing that to my private key") Something to play with a bit more when I have time - an hour (of code) is not much time; there are always compromises between getting stuff done realistically and cheating in tutorials, but as long as there's something you can take away and play with later. As regards the key shenanigans, DIRAC say they will be working with EGI on SSO, so that's promising. Got the T-shirt, too. "Interware," though?

On the security side, OSG have been interfacing to DigiCert, following the planned termination of the ESNET CA. Once again grids have demands that are not seen in the commercial world, such as the need for bulk certificates (particularly cost effective ones - something a traditional Classic IGTF can do fairly well.) Other security questions (techie acronym alert, until end of paragraph) include how Argus and XACML compare for implementing security policies, and the EMI STS - CERN looking at linking with ADFS. And Malaysia are trialling an online CA based on a FIPS level three token with a Raspberry π.

EGI federated cloud got mentioned quite a few times - KISTI interested in offering IaaS, also Australia interested in joining. Philippines providing resources. EGI have a strategy for engagement. Interesting the extent to which they are driving the of CDMI.

I should mention Shaun gave a talk on "federated" access to data, comparing the protocols - which I missed - the talk, I mean - being in another session, but I understand it was well received and there was a lot of interest.

Software development - interesting experiences from the dCache team and building user communities with (for) DIRAC. How are people taught to develop code? The closing session was by Adam Lyon from Fermilab who talked about the lessons learned - the HEP vision of big data being different from the industry one. And yet HEP needs a culture shift to move away from the not-invented-here.

ISGC really had a great mix of Asian and European countries, as well as the US and Australia. This post was just a quick look through my notes; there'll be much more to pick up and ponder over the coming months. And I haven't even mentioned the actual science stuff ...

24 March 2008

Grid storage not working?

Well, going by what I heard last week at LHCb software week, I think the answer to this question is "No". The majority of the week focussed on all the cool new changes to the core LHCb software and improvements to the HLT, but there was an interesting session on Wednesday afternoon covering CCRC and more general LHCb computing operations. The point was made in 3 (yes, 3!) separate talks that LHCb continue to be plagued with storage problems which prevent their production and reconstruction jobs from successfully completing. The main issue is the instability of using local POSIX-like protocols to remotely open files on the grid SE from jobs running on the site WNs. From my understanding, this issue could broadly be separated into two categories:

1. Many of the servers being used have been configured in such a way that if a job held a file in an open state for longer than (say) 1 day, the connection was being dropped, causing the entire job to fail.

2. Sites have been running POSIX-like access serices on the same hosts that are providing the SRM. This isn't wrong, but is definitely not recommended due to the load on the system. Anyway, the real problem comes when the SRM has to be restarted for some reason (most likely an upgrade) and the site(s) appear to have just been restarting all services on the node which again resulted in any open file connections being dropped and jobs subsequently failing. I thought it was basic knowledge that everyone knew about, but apparently I was wrong.

LHCb seem to be particularly vulnerable as they have long running reconstruction jobs (>33 hours),resulting in low job efficiency when the above problems rear their ugly heads. I would be interested in comments from other experiments on these observations. Anyway, the upshot of this is that LHCb are now considering on copying data files locally prior to starting their reconstruction jobs. This won't be possible for user analysis jobs, which will be accessing events from a large number of files. Copying all of these locally isn't all that efficient, nor do you know a priori how much local space the WN has available.

xrootd was also proposed as an alternative solution. Certainly dCache, CASTOR and DPM all now provide an implementation of the xrootd protocol in addition to native dcap/rfio, so getting it deployed at sites would be relatively trivial (some places already have it available for ALICE). I don't know enough about xrootd to comment, but I'm sure if properly configured it would be able to deal with case 1 above. Case 2 is a different matter entirely... It should be noted (perhaps celebrated?) that none of the above problems have to do with SRM2.2.

Of course, LHCb only require disk at Tier-1s, so none of this applies to Tier-2 sites. Also, they reported that they saw no problems at RAL: well done guys!

In addition, the computing team have completed a large part of the stripping that the physics planning group have asked for (but this isn't really storage related).

GridPP storage news

03 April 2017

Current storage scope within GridPP

20 April 2016

ZFS compression for LHC experiments data

21 August 2014

Updated data models from experiments

31 March 2014

Highlights of ISGC 2014

24 March 2008

Grid storage not working?

Current SRM versions

GridPP storage availability

Label Cloud

Links

Contributors

Blog Archive

GridPP storage availability