22 November 2007

gridpp-storage turns 1(00)

Just thought I would let everyone know that we have now reached the 100th posting on this little blog and it also happens to fall exactly on the 1 year anniversary of it's creation. Strange how these two milestones coincide like that. Doing the maths, it means that there are about 2 postings per week. What I don't know, however, is whether or not this implies that there is too much or too little to talk about when it comes to Grid storage? Maybe it means I should be doing more work.

As an aside, when I said "everyone" above, I don't actually know how large "everyone" is. As a guess, I would say at most 7 people. Maybe a few comments below would prove me wrong...

Also, I should say that Lancaster are planning on moving to dCache 1.8.0 in production next week. This is great news as it measn they will be all set for the CCRC'08 (the last C means Challenge, that's the important bit) that is due to start 1Q08. Everyone else should have similar plans in their minds.

SRM2.2 deployment workshop - day 2 summary

OK, time to finish off my summary of the workshop; I'll try not to take as long this time.

After a great conference dinner (and amazing speech!) in the George Hotel, I made sure I was in NeSC early on the Wednesday morning to check everything was in working order after the power cut from the previous night. Unfortunately, there were a few gremlins in the system. First of all, a circuit breaker had gone, meaning that only half of the tutorial machines had power, and of those, only half of them had networking. At that point, the hands-on session was basically dead in the water. Fortunately, a University spark (that means electrician) appeared out of nowhere and flicked the switch. A quick reboot and all of the machines came back up - phew! I then had to spend a bit of time reconfiguring a few nodes which the developers had been set loose on; using virtual machines here would definitely have helped, but when I tried this prior to the workshop, the machines at NeSC were really struggling (I've already recommended that they upgrade their hardware). Anyway, the tutorials started and people were able to log on, configure some SRM2.2 spaces and run some client commands as a check. It helped to have the tutorial hand-out that I had prepared along with the dCache and DPM developers (thanks Timur + Sophie).

So far, so good, but disaster wasn't far away as the NeSC firewall had been reset overnight meaning that the tutorial machines could no longer speak to the top-level BDII in the physics department. This was discovered after a quick debug session from Grid guru Maarten Litmaath and the firewall rules were fixed. Unfortunately, this broke wireless connectivity for almost everyone in the building! By far, this was the worst thing that could happen at a Grid meeting - everyone was starting to foam at the mouth by lunch. To be honest, this was the best thing to happen as it meant there were no distractions from the tutorial - something to remember for next time I think ;)

The afternoon kicked off with a discussion about the the rigourous testing that the different SRM2.2 implementations have gone through to ensure they conform to the spec and are able to inter-operate. Things look good so far - fingers crossed in production it reamins the same. We then had a short sessiona about support as this is the latest big thing to talk about with regards to storage. The message that I would give to people is that make sure you stay involved in the community that we have built up; contribute to the mailing list (dpm-user-forum@cern.ch was announced!) and add to the wikis and related material that is on the web. It's really important that we help one another and not constantly pester the developers (although there help will still be needed!). The final session of the workshop talked about the SRM client tools and changes that have been made to them to support SRM2.2. Clearly, sites should have a good working relationship with these tools as they will be one of the mains weapons to check that SRM2.2 configuration is working as expected.

So that's it, the end of the SRM2.2 deployment workshop. I think everyone (well, most) enjoyed their time in Edinburgh and learnt a lot from each other. The hands-on proved a success and this should be noted for future events. Real data is (hopefully!) coming in 2008 so we had better make sure that the storage is prepared for it so that the elusive Higgs can be found! Remember, acting as a community is essential, so make sure that you stay involved!

See you next time.

17 November 2007

SRM2.2 deployment workshop - day 1 summary

So that's the workshop over - it really flew by after all of the preparation that was required to set things up! Thanks to everyone at NeSC for their help with the organisation and operations on the day. Thanks also to everyone who gave presentations and took tutorials at the workshop, I think these really allowed people to learn about SRM2.2 and all of the details required to configure DPM, dCache or StoRM storage at their site.

Day 1 started with parallel dCache Tier-1 and StoRM sessions. It was particularly good for the StoRM developers to get together with interested sites as they haven't had much of a chance to do this before. For dCache (once the main developer arrived!), FZK and NDGF were able to pass on information to the developers and other Tier-1s about performing the upgrade to v1.8.0. It was noted that NDGF are currently just using 1.8.0 as a drop in replacement for 1.7.0 - space management has yet to be turned on. FZK had experienced a few more problems - primarily due to a high load on PNFS. However, the eventual cause was traced to a change in pool names without a subsequent re-registration - this is unrelated to 1.8.0.

I had arranged for lunch for the people who turned up in the morning. Of course, I should have known that a few extras would have turned up - but I didn't expect quite as many. Luckily we had enough food, just. I'll know better for next time!

The main workshop started in the afternoon where the concepts of SRM2.2 were presented in order to educate sites and give them the language that would be used during the rest of the workshop. Thanks to Flavia Donno and Maarten Litmaath describing the current status. We then moved onto short presentations from each of the storage developers, outlining the system and how SRM2.2 was implemented. Again, this helped to explain concepts to sites and enabled them to see the differences with the software that is currently deployed.

Stephen Burke gave a detailed presentation about v1.3 of the GLUE schema. This is important as it was introduced to allow for SRM2.2 specific information to be published in the information system. Sites need to be aware of what their SEs should be publishing - there should be a SAM test that will check this out.

The final session of the day was titled "What do experiments want from your storage?". I was hoping that we could get a real idea of how ATLAS, CMS and LHCb would want to use SRM2.2 at the sites, particularly Tier-2s. LHCb appear to have the clearest plan of what they want to do (although they don't want to use Tier-2 disk) as they were presenting exactly the data types and space tokens that they would like set up. CMS presented gave a variety of ideas for how they could use SRM2.2 to help with the data management. While these are not finalised I think they should form the basis for further discussion between all interested parties. For ATLAS, Graeme Stewart gave another good talk about their computing model and data management. Unfortunately, it was clear that ATLAS don't really have a plan for SRM2.2 at Tier-1s or 2s. He talked about ATLAS_TAPE and ATLAS_DISK space tokens, which is a start, but what is the difference between this and just having separate paths (atlas/disk and atlas/tape) in the SRM namespace? What was clear, however, was that ATLAS (and the other experiments) want storage to be available, accessible and reliable. This is essential for both WAN data transfers and local access for compute jobs and is really what the storage community have to focus on prior to the start of data taking - we are here for the physics after all!

So, Day 1 went well, that is until we had a power cut just after the final talk! To be honest though, it turned out to be a good thing as it meant people stopped working on their laptops and got out to the pub. This was another reason for having the workshop - allowing people to mix and match faces to the names that they have seen on the mailing lists. It all helps to foster that community spirit that we need to support each other,

OK, enough for now. I'll talk about Day 2 later.

SRM2.2 deployment workshop - picture gallery







If people have any more photos that they would like to share, then feel free to send them to me.

16 November 2007

Storage workshops - the solution to (most of) our problems


While reviewing the outcomes of the workshop, I have come to the conclusion that we should have storage workshops more often as they clearly improve the reliability and availability of Grid storage. I took this snapshot of SAM test results for UK sites the day after the Edinburgh meeting and I think it's obvious that there is a lot of green about, although one site shall remain nameless...

Unfortunately, normal service appears to have resumed. So who is planning the next meeting?

14 November 2007

Dark storage?

The lights went out on the SRM2.2 deployment workshop yesterday evening, but it was timed to perfection as the final talk had just finished! Clearly my plan to get everyone out of the building worked really well ;)

Dinner in the George Hotel was excellent (as was the beer afterwards), I will post some photos from the restaurant and day-1 at some point today.

10 November 2007

SRM2.2 deployment workshop - bulletin 5


View Larger Map

The workshop is almost upon us, I can't believe it's come round this fast! It was only a few weeks ago that I was proposing the idea of running this event. Preparations have been continuing apace and I now have the computer lab set up with 20 dCache and DPM test instances. Interested participants will use their laptops to connect to these machines during the tutorials and will configure the SRM2.2 service. There will be contributions from many sources during the workshop; the developers, experiment representatives and members of the WLCG GSSD are all putting together material. Thanks to everyone for all of their efforts!

The map above pinpoints some of the main locations that you will need to know about during the workshop. Hopefully you'll get the chance to see some more of Edinburgh's attractions while you are here.

Check Indico for the latest agenda:

http://indico.cern.ch/conferenceDisplay.py?confId=21405

Note that we have added a discussion session about the future of storage support. This is important given that (fingers crossed) we will be receiving physics data next year; we need to ensure the storage services have high availability and reliability.

See you next week.

02 November 2007

SRM2.2 deployment workshop - bulletin 4

We are now up to 55 registered participants! This is great - clearly a lot of people are interested in learning about SRM and the workshop presents a good opportunity for everyone to meet the middleware developers; find out about basics (and the complexities) of SRM2.2 and exchange hints/tips with site admins from all over WLCG.

I would ask everyone to come prepared for the workshop - have a think about your current storage setup and how this will be changing over the coming year. I would imagine everyone will be buying more disk so how do you want this set up? If you support many VOs, do you want to give dedicated disk to some of them while the others share?

Also, I've worked out some more of the technicalities of the hands on session. There are 20 desktop machines at NeSC which we can use for the tutorial and I have started setting up dCache/DPM images which will be rolled out across the cluster. I'll ask everyone to use their laptops to ssh onto the machines, but everyone will have to work in pairs; unfortunately there aren't hosts for one each. We'll be using a temporary CA for host and user certificates.

Finally, just to clarify that the workshop starts at 1pm on Tuesday 13th Nov. Make sure you pick up some lunch beforehand! You should be able to find plenty of eateries near to NeSC. Looking forward to see you all soon.

http://indico.cern.ch/conferenceDisplay.py?confId=21405