20 December 2012

Happy Holidays From Dave and Georgina!

As we come to the arbitrary cycle end of the common  major period of time coinciding with the middle of the northern hemisphere winter, I though I should give an update of how things have been going for me.
I (Dave) currently have relatives in 110 rooms; (out of the 748 which ATLAS own.)
There are 988 unique children; of whom are:
723 Dirks'
138 Ursulas'
35 Gavins'
1 Valery
1 Calibration
Worryingly is how many do not have a clone in another room, hence if that room gets destroyed then is lost. Of the 988 children:
1 has 4 clones
8 have 3 clones
53 have 2 clones
143 have 1 clone
693 are unique to the single room they live in.
These unique children have 48743 files and a size of 7TB. These are the numbers that are at risk of permanent loss. Thankfully; 17173 files (~9.074 TB)  are safely replicated.
The newest children were born 15/11/12; so users are still finding me and/or my children interesting.

Large rate for transfers in 2012 to the UK

2012 was good year for FTS transfers within the UK. A brief look we can see that the Tier2s and RAL Tier 1 ingested 27.66 PB of data (20.23PB being ATLAS files.) This represents  41.77 million successful files transferred (38.94M files for ATLAS.) These figures are for the production FTS servers involved within the WLCG ; and so do not include files within by test servers and direct user client tools.
As can be seen below th emajor amounts of data transfer have been in the last eight months .
Examples of individual sites FTS rates are shown below:






























19 December 2012

Storage/Data management overview from ATLAS Jamboree

I luckily had the opportunity to go to the ATLAS jamboree at CERN in December, a link to which can be seen here:
http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=196649

Part of my time was to give the report on the FTS3 tests we within the UK have been doing for CMS and ATLAS, but other points of interest I picked up which might be of interest (or not!) are that:

  • ATLAS plan to reduce the need for space tokens as much as possible.
  • Plan to rename all files ( over 120PB!) of files for the new RUCIO system.
  • webDaV and xrootd for T2s being required.
  • Storage on cloud still being looked into.
  • Plan for SRM to not be used at disk only sites.
  • How to rename files on tape for RUCIO (and how tape families will work in new system) are still under investigation.
  • Plan for xrootd and webdav usage in read only mode to start with; but intend to test read/write for both LAN and WAN access. (And both full and sparse read of the files using xrootd.
  • Known SL6 xrootd problem for DPM storage systems causing "headaches"!
  • ATLAS plan a full dress rehearsal of usage of Federated Access by Xrootd (FAX) for January 21st. It will be interesting to see if we can get any further sites in the UK involved.
  • ATLASHOTDISK as a space token should be able to go away "soon" (if site has CVMFS). 
And there I thought I was going to have a quiet new year! Of course some of these changes are already planning for the long shut down (LS1); but it appears it is going to be an interesting 2013.

It's the season to go "Golly!"

As with every activity when the end of the year is nigh, it is useful to look back. And forward.

It's been a good year for GridPP's storage and data management group (and indeed for GridPP itself), and in a sense it's easy to be a victim of our own success: that researchers just expect the infrastructure to be there. For example, some of the public reporting of the Higgs events seemed to gloss over how it was done, and the fact that finding the 400 Higgs events needle in a very large haystack was a global effort - no doubt to keep things simple... RAL currently holds about 10 PB of LHC data on tape, and around 6PB on disk. What we store is not the impressive bit, though - what we move and analyse is much more important. RAL routinely transfers 3 GB/s for a single one of the experiments (usually ATLAS and CMS.) QMUL alone reported having processed 24PB over the past year.  So we do "big data."

In addition to providing a large part of WLCG, GridPP is also supporting non-LHC research. The catch, though, is that they usually have to use the same grid middleware While at first this seems like a hurdle, it is the way to tap into the large computing resources - many research case studies show how it's done.

So "well done" sung to the otherwise relatively unsung heroes and heroines who are keeping the infrastructure running and available. Let's continue to do big data well - one of our challenges for the coming year will be to see how wider research can benefit even more - and maybe how we can get better at telling people about it!

11 December 2012

2nd DPM Community Workshop


The DPM workshop (http://indico.cern.ch/conferenceTimeTable.py?confId=214478#20121203.detailed)
was a very worthwhile meeting and quite well intended in person. It could have done with more people there from the UK - but there were several UK contributions via Vidyo.
On the morning of the first day, Ricardo and Oliver laid out the work so far and roadmap. It was impressive to see that DMLite has becoming a reality with a number of plugins since the last workshop. (And most of this workshop was indeed devoted to DMLite). It was also good to see things like the overloading of disk servers being addressed in the roadmap. We then saw the priorities from other big users, of which it was interesting to see ASGC keen on NFS v4 (which I am not sure we need with xrootd and http) - also they are hosting the next DPM workshop in March collocated with ISGC.

Sam in the Admin Toolkit talk, described plans for rebalancing tool which should probably use DMLite and liase with Ricardo's plans in this area.
The globes online talk did bring up interesting questions of gridftp only transfers as did the ATLAS presentation which talked more explicitly about "ATLAS plans to migrate to a world without srm" then I have heard before, and asked for both performant gridftp transfers and a du solution for directories. The first is being worked on by the DPM team, the last just needs a decision, and ideally a common solution across storage types. CMS seemed happier with DPM in their presentation than they do in normal operations and ALICE seemed happier too now that xrootd is performing well on DPM.

The afternoon was devoted to DMLite and its many interfaces - e.g. S3 and HDFS. Can't wait to play with those in the new year…. Martin presented some interesting things on remote IO performance - certainly xrootd and http offer a much more pleasant direct access experience than rfio (but only with TTreeCache on - which is not guarantied for ATLAS users so we may still be copying to scratch until we can get that switched on by default). We also saw that the WebDav and xrootd implementations are in good shape (and starting to be used - see for example my FAX (ATLAS xrootd) talk - but I think still they need to be tested more widely in production before we know all the bugs). Oliver presented that DPM too was looking to a possible post-SRM landscape and ensuring that their protocols were able to work performantly without it.

Tuesday consisted of a lot of demos showing that DMLite was easy to setup, configure and even develop for (if you missed this then some of the same material was covered in the "Webinar" that should be available to view on the DPM webpages). In addition Ricardo showed how them configure nodes with Puppet and there was a discussion around using that instead of yaim. In the end we decided for "Option 3" - that yaim would be used to generate a puppet manifest that would be run with a local version of puppet on the node so that admins could carry on with yaim for the time being (but it would allow a transition in the future).

The "Collaboration Discussion" that followed was the first meeting of those partners that have offered to help carry DPM forward post-EMI. It was extremely positive in that there seems enough manpower - with core effort at CERN and strong support from ASGC. It seemed like something workable could be formed with tasks divided effectively so there is no need for anyone to fear anymore.

This will be my last blog of the year (possible forever as I dislike blogging) so Happy Holidays Storage Fans!