29 May 2012

CHEP Digested



Apologies for not blogging live from CHEP / WLCG meeting but it was busy for me with talks and splinter meetings. So please find below my somewhat jet-lagged digest of the week:

WLCG meeting: 

News (to me) from first day was that there will be a new Tier 0, in hungary (!) The current plan is to build a beefy network and split jobs and storage without care. Given the not irrelevant expected latency that didn't seem like the most obviously best plan.

Sunday, somewhat disappointing. Little was planned for the TEG session. The chairs were explicitly told no talk expected off them - only to find on the day that it was - which therefore ended up rather regurgitating some of the conclusions and reiterating some of the same discussions. Apparently the TEGs are over now - despite their apparent zombie state, I hope that we can make something useful building on what was discussed outside any process rather than waiting for what may or may not be officially formed from their wake.

On a non-storage note, I did ask one clarification from Romain on glexec, the requirement is for sites to provide fine grained traceability not necessarily to install glexec though the group did not know of any other current way to satisfy the requirement. There was also some discussion on whether the requirement amounted to requiring identity switching though it seemed fairly clear that it need not. If one can think of another way to satisfy the real requirement than one can use it.

CHEP day 1: 

Rene Brun gave a kind of testimonial speech - which was a highlight of the week (because he is a legend). Later in the day he asked a question in my talk on ATLAS ROOT I/O - along the lines that he previously seen faster rates in reading ATLAS files with pure ROOT so why was the ATLAS software so much slower (the reasons are Transient->Persistent conversion as well as some reconstruction of objects). Afterwards he came up to me and said he was "very happy" that we were looking at ROOT I/O (which made my week really).
Other than my talk (which otherwise went well enough), the "Event Processing" session saw a description from CMS on their plans to make their framework properly parallel. A complete rewrite like this is possibly better approach than the current ATLAS incremental attempts (as also descried in this session by Peter V G ) - though its all somewhat pointless unless big currently sequential (and possibly parallelizable ) parts like tracking are addressed.

CHEP day 2:

Sam blogged a bit about the plenaries. The parallel sessions got off to a good start (;-)) with my GridPP talk on analysing I/O bottlenecks: the most useful comment was perhaps that by Dirk on I/O testing at CERN (see splinter meeting comment below). There was then a talk regarding load balancing for dCache which seemed fairly complicated algorithm, but, if it works, perhaps worth adopting in DPM. Then a talk on xrootd from (of course) Brian B , but describing both ATLAS and CMS work. To be honest I found the use cases less compelling than I have done previously but still lots of good work on understanding these and worth supporting future development (see again splinter meetings below).

The posted session was, as Sam indicated, excellent - though way way too many posters to mention. The work on DPM both in DM-LITE and WebDav is very promising but the proof will be in the production pudding that we are testing in the UK (see also my and Sam's CHEP paper of course).
Back in the parallel sessions, the hammercloud update showed some interesting new features and correlations between outages towards reducing the functional testing load. CMS are now using HC properly for their testing.

CHEP day 3:

In terms of the ROOT plenary talk - I would comment on Sam's comments that the asynchronous prefetching does need some more work (we have tested it) but at least it is in there (see also ROOT I/O splinter meeting comments below). I also noted that they offer different compression schemes now which I haven't explored.
The data preservation is indeed interesting, Sam gave the link to the report. Of the two models of ensuring one can run on the data: maintaining an old OS environment or validating a new one. I find the later most interesting but really I wonder whether experiments will preserve manpower on old experiments to check and keep up such a validation.

Andreas Peters's talk in the next session was the most relevant plenary to storage. As Sam suggested it was indeed excellently wide ranging and not too biased. Some messages: storage still hard and getting harder with management, tuning and performance issues. LHC storage is large in terms of volume but not number of objects. Storage interfaces are split in terms of complex / rich such as posix and reduces such as S3. We need to be flexible to both profit from standards/ community projects but not to be tied to any particular technology.

CHEP day 4:

The morning I mostly spent in splinter meetings on Data Federations and ROOT I/O (see below) . Afternoon there was a talk from Jeff Templon on the NIKHEV tests with WebDav and proxy caches which is independent of any middleware implementation. Interesting stuff though somewhat of a prototype and should be integrated with other work. There was also some work in Italy on http access which needs further testing but shows such things are possible with Storm.
After coffee and many many more posters (!), Paul M showed that dCache is pluggable beyond plugaable (including potentially interesting work with HDFS (and Hadoop for log processing)). He also kept reassuring us that it will be supported in the future.

Some Splinter Meetings / Discussions: 


  • Possibilities for using DESY grid lab for controlled DPM tests. 
  • Interest in testing dCache using similar infrastructure as we presented for DPM. 
  • ATLAS xrootd federating pushing into EU with some redirectors installed at CERN and some sites in Germany and (we volunteered) the UK (including testing the new emerging DPM xrootd server)
  • DPM support . Certainly there will be some drop in CERN support post EMI. Lots more discussions to be had, but it seemed optimistic that there would be some decent level of support from CERN providing some could also be found from the community of regions/ users.
  • Other DPM news: Chris volunteered for DM-LITE on lustre; sam and I for both xrootd and web dav stuff. 
  • ROOT I/O - Agreement to allow TTreeCache to be set in the environment. More discussion on optimise baskets (some requirements from CMS that make it more complicated). Interest in having monitoring internal to ROOT, switched on in .rootrc: a first pass at a list of variable to be collected was constructed.
  • I/O benchmarking - Dirk at CERN has some suite that both provides a mechanism for submitting tests and some tests itself that are similar to the ones we are using (but not identical). We will form a working group to standardise the test and share tools.




1 comment:

Alessandra Forti said...

The work done by Nikhef with WebDAV seemed unusable the constraint on the local machine memory is a really big hurdle. It's all based on http redirection. An idea already developed here

http://www.gridsite.org/wiki/SlashGrid

but funnily enough back then nobody wanted to hear about either xrootd nor http based solutions.