24 October 2010

A computing banquet

Delayed post from Thursday....
So another day, another 100 talks, followed by another 100 food courses.

Just to provoke a suitable tirade from Sam, I will describe that morning's SSD plenary as an "informative and thorough summary on the unquestionable advantages of SSDs" (not really). More interesting was a talk from David South on long term data preservation for experiments. A very worthwhile idea I think, and I hope it can be supported.
In parallel talks, there was an update on Hammercloud developments - now available for LHCb, CMS as well ATLAS and apparanty in the future other VOs will be able to "plugin". Also coming in the future are more advanced/configurable statistics.
Andreas Peters outlined the cern "disk pool project" EOS, and, as mentioned by Sam, the obvious questions followed from Brian B and others, "why yet another filsystem/ storage manager? Are dCache. HDFS etc.etc. etc not worth adapting..." But to look at the positive side (as they are certainly going to carry on with this anyway) if something good develops then it maybe something worth T1s or T2s trying out.
In the same session there was another tool presented that we might use, a flexible benchmark that allows you to trace any application and then "play it back": copying the disk calls.Potentially very useful for testing out new kit - and, though it hasn't yet been packaged for general consumption, we'll defiantely be following up with the developer for a preview.

Friday was the stream leaders summaries, which you don't need since you have had ours ;-).
Overall I've been very impressed both with the organisation of the conference and of the quality of the talks...

And, as my temple visits should have placated the travel gods to guide me home through typhoons, strikes and whatever else is out there; we should be back to provide a more digested summary in next wednesdays storage meeting.

20 October 2010

CHEP: or, how I learned to stop worrying and love the rain.

So, day three of CHEP was a half day, so not so much to report as Wahid did yesterday.

The plenaries were not directly interesting from a storage perspective, but I should ebtion them for other qualities.
First, Kate Keahey told us all why clouds (and public clouds, federated clouds - "sky computing") were awesome. I guess I'm just a cynic, as I still don't see how they're significantly better than Condor pools (plus flocking, plus VM universe). Also, the data flow problem is decidedly unsolved for analysis-class jobs in this context.


Secondly, Lucas Taylor impressed on us how important it was to talk to the media (and, more importantly by far, the public). Apparently, the most significant source of hits on CERN webpages is Twitter! Considering also that the LHC is only 1/3 as popular as Barack Obama on YouTube, it does seem that the right approach can really bring in public interest, and this can only be a good thing.


Finally, Peter Malzcher told us about the FAIR project, which is to be the next big accelerator at GSI. Honestly, it looks awesome, but the 6MW cooling solution for the cluster looks terrifying.



Since I was presenting today, I only have notes from the session I was scheduled for.

The first two talks, both on virtualisation, confirmed that io can be an issue for many-VM hosts. The solution of the day appears to be iSCSI.
Then some dangerous radical told everyone to throw their shoes in the machinery that MLC flash isn't all it's cracked up to be in SSDs.
More upsets followed when Yves Kemp showed that pNFS/NFS4.1 is much better than dCap in almost all possible cases. It is, however, possible that dCap's problem is simply too much readahead.
Finally, Dirk Duellmann gave us an update from CERN storage. Essentially, they're pretty stable at the front-end, growing storage at 15PB/y. Additionally, they're trialling EOS for disk pool filesystems. EOS, as Jeff Templon got Dirk to admit under cross-examination, is basically Hadoop over xrootd protocol, with a better namespace.
Despite agreeing at Amsterdam that reinventing the wheel in private projects was Bad... (CERN could have chosen to patch Hadoop, or even Ceph, instead).

Tch.

CHEPping part 2

Having got my talk out of the way (more on that later), I am now free to blog my view on activities so far here in Taiwan. I will avoid telling you about the driving rain, puppet shows and million-mini-course dinners, sticking instead to the hard storage facts.
My highlight/snippets on Monday/Tuesday activities:
- Lots on many core - but the valid question was asked, can IO keep up with this?
- Partick told us the plans (as they are now) for Data management middleware in EMI. Storm is in the plans (though was somewhat absent from the session to provide an update on their status.)
- Oliver told us the roadmap for DPM:(immediate news is that DPM 1.8.0 is in certification including a 3rd party rfcp to allow it to be used for draining.)
- Ricardo gave a nice talk on the DPM work on NFS4.1 which has reached the stage of a prototype.
For the slides on the later talks see this session:
http://117.103.105.177/MaKaC/sessionDisplay.py?sessionId=33&slotId=0&confId=3#2010-10-19

- My talk went OK with many questions, including those (interested in) doing similar benchmarking work. Hopefully we can get some common ideas towards providing something useful for sites to test and tune.
- Unfortunately I was talking at the same time as Illija's talk on the ATLAS root improvements which among other things outline that some of the further improvements in ROOT 5.26 would not be available in the current ATLAS reprocessing due to some other bugs which, thanks to connections made during the talk, may get fixed. Also up at the same time (!
) was Philippe Canal's talk with more detail on the ROOT changes as well as CMS's experiences in implementing them (http://117.103.105.177/MaKaC/contributionDisplay.py?contribId=150&sessionId=46&confId=3)

Other news - we had a very productive meeting with the DPM team, which should see us soon getting hold of the prerelease NFS4.1 interface for testing (around the next month) and also (probably before that) we'll be testing the "3rd party rfcp" mentioned above to tune it for fastest possible drains (yeah!) . We also talked about creating a central repository to collect together any DPM related nagios probes that people are using (before consolodating / adding new ones)

Packed days (so many sessions at once that even with Sam and I covering different sessions we are still missing half the stuff) - and we are only half way through! So standby for more info, if I don't get lost in the electronics markets or washed away by a typhoon.

18 October 2010

CHEP 2010: Episode 4: A New Hope

The story so far:

The evil Empire of CERN has succeeded in paralyzing the world's data networks by distributing vast quantities of 'event data' from their Death Star in Geneva.

However, at this very moment, a band of resistance fighters are congregating on the forest moon island of Taiwan to lead the fight back...




Ahem. So, Wahid and I are currently in Taipei for CHEP2010. Despite the jetlag encouraging me to write paragraphs like the above, we're seeing lots of interesting things.
Tellingly, the inaugural speech was given by the Vice President of Taiwan, and he mentioned how important Science was to Taiwanese success. Unlike France, I suspect the UK would find it hard to get Nick Clegg to turn up in similar circumstances.
Back to physics, where Ian Bird, and Roger Jones sequentially told us how successful we'd all been over the year, and Craig Lee told us how awesome cloud computing will be when it is public. Just like the Grid (and Condor clusters) before it, eh?
Finally, we had a discussion of many-core scaling for LHC VOs by Sverre Jarp. This is an area of significance in data provision, and the challenge of scaling io is something we're still looking at how best to address.

Of the parallel talks I attended, the most interesting was the CERNVMFS talk - it's still impressive how well it works.
Other interesting things: talks on EMI release processes (they have QA metrics!), posters on FTS over scp, Amazon ECC for CMS (too expensive), L-Grid webportal, and the ATLAS consistency service.

More from Wahid tomorrow.