16 December 2007
Anyone for some dCache monitoring?
The above plots come from some new dCache monitoring that I have set up to study the behaviour of the Edinburgh production storage (srm.epcc.ed.ac.uk). This uses Brian Bockleman's GraphTool and some associated scripts to query the dCache billing database. You can find the full set of plots here (I know, it's a strange hostname for a monitoring, but it's all that was available):
http://wn3.epcc.ed.ac.uk/billing/xml/
GraphTool is written in python and uses matplotlib to generate the plots. Cherrypy is used for the web interface. The monitoring can't just be installed as an rpm: you need to have PostGreSQL 8.2 available; create a new view in the billing database; set up Apache mod_rewrite; ensure you have the correct compilers installed..., but these steps shouldn't be a problem for anyone.
I think you will agree that the monitoring presents some really useful views of what the dCache is actually doing. It's still a work in progress, but let me know when you want to set it up and I should be able to help.
It should be possible to do something similar for DPM in the coming weeks.
10 December 2007
DPM on SL4
Time to break out the champagne, it looks like DPM will be officially released in production on SL4 next Wednesday.
"Based on the feedback from PPS sites, we think that the following
patches can go to production next Wednesday:
# available on the linked GT-PPS ticket(s)
#1349 glite-LFC_mysql metapackage for SLC4 - 3.1.0 PPS Update 10
#1350 glite-SE_dpm_disk metapackage for SLC4 - 3.1.0 PPS Update 10
#1352 glite-SE_dpm_mysql metapackage for SLC4 - 3.1.0 PPS Update 10
#1541 glite-LFC_oracle metapackage for SLC4 - 3.1.0 PPS Update 10
#1370 R3.1/SLC4/i386 DPM/LFC 1.6.7-1 - 3.1.0 PPS Update 10"
Of course, some sites have been running SL3 DPM on SL4 for over a year and others have been running the development SL4 DPM in production for months. One warning I would give would be to make sure the information publishing is working, I've had a few problems with that in the past (in fact today I was battling with an incompatible version of perl-LDAP from the DAG repository).
"Based on the feedback from PPS sites, we think that the following
patches can go to production next Wednesday:
# available on the linked GT-PPS ticket(s)
#1349 glite-LFC_mysql metapackage for SLC4 - 3.1.0 PPS Update 10
#1350 glite-SE_dpm_disk metapackage for SLC4 - 3.1.0 PPS Update 10
#1352 glite-SE_dpm_mysql metapackage for SLC4 - 3.1.0 PPS Update 10
#1541 glite-LFC_oracle metapackage for SLC4 - 3.1.0 PPS Update 10
#1370 R3.1/SLC4/i386 DPM/LFC 1.6.7-1 - 3.1.0 PPS Update 10"
Of course, some sites have been running SL3 DPM on SL4 for over a year and others have been running the development SL4 DPM in production for months. One warning I would give would be to make sure the information publishing is working, I've had a few problems with that in the past (in fact today I was battling with an incompatible version of perl-LDAP from the DAG repository).
06 December 2007
Storage as seen by SAM
We all need more monitoring, don't we? I knocked up these plots showing the storage SAM test results for the ops VO at GridPP sites over the past month. I am only looking at the SE and SRM tests here, where the result for each day is calculated as the number of successes over the total number of tests. The darker-green the square the higher the availability. I think it's clear which sites are having problems.
http://www.gridpp.ac.uk/wiki/GridPP_storage_available_monitoring
We always hear that storage is really unreliable for the experiments, so I was actually quite surprised at the amount of green on the first plot. However, I think since these results are only for the short duration ops tests, they do not truely reflect the view that the experiments have of storage when they are performing bulk data transfer across the WAN or a large amount of local access to the compute farm.
These plots were generated thanks to some great python scripts/tools from Brian Bockleman (and others, I think) from OSG. Brian's also got some interesting monitoring tools for dCache sites which I'm having a look at. It would be great if we could use something similar in GridPP.
04 December 2007
dCache 1.8.0-X
A new patch to dCache 1.8.0 was released on Friday (1.8.0-6). In addition, there is now a 1.8.0 dcap client. All rpms can be found here:
http://www.dcache.org/downloads/1.8.0/index.shtml
Sites (apart from Lancaster!) should wait for all of the Tier-1s to upgrade first of all as there are still some bugs being worked out.
http://www.dcache.org/downloads/1.8.0/index.shtml
Sites (apart from Lancaster!) should wait for all of the Tier-1s to upgrade first of all as there are still some bugs being worked out.
dCache admin scripts
Last week I finally got a chance to have another look at some of the dCache administration scripts that are in the sysadmin wiki [1]. There is a jython interface to the dCache admin interface, but I find it difficult to use. As an alternative, the guys at IN2P3 have written a python module that creates a dCache admin door object which you can then use in your own python scripts to talk to the dCache [2]. One thing that I did was use the rc cleaner script [3] to clean up all of the requests (there were 100's!) that were stuck in Suspended state. You can see how the load on the machine running postgres dropped after removing the entries. Highly recommended.
I also wrote a little script to get information from the LoginBroker in order to print out how many doors are active in the dCache. This is essential information for sites that have many doors (i.e. Manchester) but find the dCache 2288 webpage difficult to use. I'll put it in the SVN repository soon.
[1] http://www.sysadmin.hep.ac.uk/wiki/DCache
[2] http://www.sysadmin.hep.ac.uk/wiki/DCache_python_interface
[3] http://www.sysadmin.hep.ac.uk/wiki/DCache_rc_cleaner
I also wrote a little script to get information from the LoginBroker in order to print out how many doors are active in the dCache. This is essential information for sites that have many doors (i.e. Manchester) but find the dCache 2288 webpage difficult to use. I'll put it in the SVN repository soon.
[1] http://www.sysadmin.hep.ac.uk/wiki/DCache
[2] http://www.sysadmin.hep.ac.uk/wiki/DCache_python_interface
[3] http://www.sysadmin.hep.ac.uk/wiki/DCache_rc_cleaner
22 November 2007
gridpp-storage turns 1(00)
Just thought I would let everyone know that we have now reached the 100th posting on this little blog and it also happens to fall exactly on the 1 year anniversary of it's creation. Strange how these two milestones coincide like that. Doing the maths, it means that there are about 2 postings per week. What I don't know, however, is whether or not this implies that there is too much or too little to talk about when it comes to Grid storage? Maybe it means I should be doing more work.
As an aside, when I said "everyone" above, I don't actually know how large "everyone" is. As a guess, I would say at most 7 people. Maybe a few comments below would prove me wrong...
Also, I should say that Lancaster are planning on moving to dCache 1.8.0 in production next week. This is great news as it measn they will be all set for the CCRC'08 (the last C means Challenge, that's the important bit) that is due to start 1Q08. Everyone else should have similar plans in their minds.
As an aside, when I said "everyone" above, I don't actually know how large "everyone" is. As a guess, I would say at most 7 people. Maybe a few comments below would prove me wrong...
Also, I should say that Lancaster are planning on moving to dCache 1.8.0 in production next week. This is great news as it measn they will be all set for the CCRC'08 (the last C means Challenge, that's the important bit) that is due to start 1Q08. Everyone else should have similar plans in their minds.
SRM2.2 deployment workshop - day 2 summary
OK, time to finish off my summary of the workshop; I'll try not to take as long this time.
After a great conference dinner (and amazing speech!) in the George Hotel, I made sure I was in NeSC early on the Wednesday morning to check everything was in working order after the power cut from the previous night. Unfortunately, there were a few gremlins in the system. First of all, a circuit breaker had gone, meaning that only half of the tutorial machines had power, and of those, only half of them had networking. At that point, the hands-on session was basically dead in the water. Fortunately, a University spark (that means electrician) appeared out of nowhere and flicked the switch. A quick reboot and all of the machines came back up - phew! I then had to spend a bit of time reconfiguring a few nodes which the developers had been set loose on; using virtual machines here would definitely have helped, but when I tried this prior to the workshop, the machines at NeSC were really struggling (I've already recommended that they upgrade their hardware). Anyway, the tutorials started and people were able to log on, configure some SRM2.2 spaces and run some client commands as a check. It helped to have the tutorial hand-out that I had prepared along with the dCache and DPM developers (thanks Timur + Sophie).
So far, so good, but disaster wasn't far away as the NeSC firewall had been reset overnight meaning that the tutorial machines could no longer speak to the top-level BDII in the physics department. This was discovered after a quick debug session from Grid guru Maarten Litmaath and the firewall rules were fixed. Unfortunately, this broke wireless connectivity for almost everyone in the building! By far, this was the worst thing that could happen at a Grid meeting - everyone was starting to foam at the mouth by lunch. To be honest, this was the best thing to happen as it meant there were no distractions from the tutorial - something to remember for next time I think ;)
The afternoon kicked off with a discussion about the the rigourous testing that the different SRM2.2 implementations have gone through to ensure they conform to the spec and are able to inter-operate. Things look good so far - fingers crossed in production it reamins the same. We then had a short sessiona about support as this is the latest big thing to talk about with regards to storage. The message that I would give to people is that make sure you stay involved in the community that we have built up; contribute to the mailing list (dpm-user-forum@cern.ch was announced!) and add to the wikis and related material that is on the web. It's really important that we help one another and not constantly pester the developers (although there help will still be needed!). The final session of the workshop talked about the SRM client tools and changes that have been made to them to support SRM2.2. Clearly, sites should have a good working relationship with these tools as they will be one of the mains weapons to check that SRM2.2 configuration is working as expected.
So that's it, the end of the SRM2.2 deployment workshop. I think everyone (well, most) enjoyed their time in Edinburgh and learnt a lot from each other. The hands-on proved a success and this should be noted for future events. Real data is (hopefully!) coming in 2008 so we had better make sure that the storage is prepared for it so that the elusive Higgs can be found! Remember, acting as a community is essential, so make sure that you stay involved!
See you next time.
After a great conference dinner (and amazing speech!) in the George Hotel, I made sure I was in NeSC early on the Wednesday morning to check everything was in working order after the power cut from the previous night. Unfortunately, there were a few gremlins in the system. First of all, a circuit breaker had gone, meaning that only half of the tutorial machines had power, and of those, only half of them had networking. At that point, the hands-on session was basically dead in the water. Fortunately, a University spark (that means electrician) appeared out of nowhere and flicked the switch. A quick reboot and all of the machines came back up - phew! I then had to spend a bit of time reconfiguring a few nodes which the developers had been set loose on; using virtual machines here would definitely have helped, but when I tried this prior to the workshop, the machines at NeSC were really struggling (I've already recommended that they upgrade their hardware). Anyway, the tutorials started and people were able to log on, configure some SRM2.2 spaces and run some client commands as a check. It helped to have the tutorial hand-out that I had prepared along with the dCache and DPM developers (thanks Timur + Sophie).
So far, so good, but disaster wasn't far away as the NeSC firewall had been reset overnight meaning that the tutorial machines could no longer speak to the top-level BDII in the physics department. This was discovered after a quick debug session from Grid guru Maarten Litmaath and the firewall rules were fixed. Unfortunately, this broke wireless connectivity for almost everyone in the building! By far, this was the worst thing that could happen at a Grid meeting - everyone was starting to foam at the mouth by lunch. To be honest, this was the best thing to happen as it meant there were no distractions from the tutorial - something to remember for next time I think ;)
The afternoon kicked off with a discussion about the the rigourous testing that the different SRM2.2 implementations have gone through to ensure they conform to the spec and are able to inter-operate. Things look good so far - fingers crossed in production it reamins the same. We then had a short sessiona about support as this is the latest big thing to talk about with regards to storage. The message that I would give to people is that make sure you stay involved in the community that we have built up; contribute to the mailing list (dpm-user-forum@cern.ch was announced!) and add to the wikis and related material that is on the web. It's really important that we help one another and not constantly pester the developers (although there help will still be needed!). The final session of the workshop talked about the SRM client tools and changes that have been made to them to support SRM2.2. Clearly, sites should have a good working relationship with these tools as they will be one of the mains weapons to check that SRM2.2 configuration is working as expected.
So that's it, the end of the SRM2.2 deployment workshop. I think everyone (well, most) enjoyed their time in Edinburgh and learnt a lot from each other. The hands-on proved a success and this should be noted for future events. Real data is (hopefully!) coming in 2008 so we had better make sure that the storage is prepared for it so that the elusive Higgs can be found! Remember, acting as a community is essential, so make sure that you stay involved!
See you next time.
17 November 2007
SRM2.2 deployment workshop - day 1 summary
So that's the workshop over - it really flew by after all of the preparation that was required to set things up! Thanks to everyone at NeSC for their help with the organisation and operations on the day. Thanks also to everyone who gave presentations and took tutorials at the workshop, I think these really allowed people to learn about SRM2.2 and all of the details required to configure DPM, dCache or StoRM storage at their site.
Day 1 started with parallel dCache Tier-1 and StoRM sessions. It was particularly good for the StoRM developers to get together with interested sites as they haven't had much of a chance to do this before. For dCache (once the main developer arrived!), FZK and NDGF were able to pass on information to the developers and other Tier-1s about performing the upgrade to v1.8.0. It was noted that NDGF are currently just using 1.8.0 as a drop in replacement for 1.7.0 - space management has yet to be turned on. FZK had experienced a few more problems - primarily due to a high load on PNFS. However, the eventual cause was traced to a change in pool names without a subsequent re-registration - this is unrelated to 1.8.0.
I had arranged for lunch for the people who turned up in the morning. Of course, I should have known that a few extras would have turned up - but I didn't expect quite as many. Luckily we had enough food, just. I'll know better for next time!
The main workshop started in the afternoon where the concepts of SRM2.2 were presented in order to educate sites and give them the language that would be used during the rest of the workshop. Thanks to Flavia Donno and Maarten Litmaath describing the current status. We then moved onto short presentations from each of the storage developers, outlining the system and how SRM2.2 was implemented. Again, this helped to explain concepts to sites and enabled them to see the differences with the software that is currently deployed.
Stephen Burke gave a detailed presentation about v1.3 of the GLUE schema. This is important as it was introduced to allow for SRM2.2 specific information to be published in the information system. Sites need to be aware of what their SEs should be publishing - there should be a SAM test that will check this out.
The final session of the day was titled "What do experiments want from your storage?". I was hoping that we could get a real idea of how ATLAS, CMS and LHCb would want to use SRM2.2 at the sites, particularly Tier-2s. LHCb appear to have the clearest plan of what they want to do (although they don't want to use Tier-2 disk) as they were presenting exactly the data types and space tokens that they would like set up. CMS presented gave a variety of ideas for how they could use SRM2.2 to help with the data management. While these are not finalised I think they should form the basis for further discussion between all interested parties. For ATLAS, Graeme Stewart gave another good talk about their computing model and data management. Unfortunately, it was clear that ATLAS don't really have a plan for SRM2.2 at Tier-1s or 2s. He talked about ATLAS_TAPE and ATLAS_DISK space tokens, which is a start, but what is the difference between this and just having separate paths (atlas/disk and atlas/tape) in the SRM namespace? What was clear, however, was that ATLAS (and the other experiments) want storage to be available, accessible and reliable. This is essential for both WAN data transfers and local access for compute jobs and is really what the storage community have to focus on prior to the start of data taking - we are here for the physics after all!
So, Day 1 went well, that is until we had a power cut just after the final talk! To be honest though, it turned out to be a good thing as it meant people stopped working on their laptops and got out to the pub. This was another reason for having the workshop - allowing people to mix and match faces to the names that they have seen on the mailing lists. It all helps to foster that community spirit that we need to support each other,
OK, enough for now. I'll talk about Day 2 later.
Day 1 started with parallel dCache Tier-1 and StoRM sessions. It was particularly good for the StoRM developers to get together with interested sites as they haven't had much of a chance to do this before. For dCache (once the main developer arrived!), FZK and NDGF were able to pass on information to the developers and other Tier-1s about performing the upgrade to v1.8.0. It was noted that NDGF are currently just using 1.8.0 as a drop in replacement for 1.7.0 - space management has yet to be turned on. FZK had experienced a few more problems - primarily due to a high load on PNFS. However, the eventual cause was traced to a change in pool names without a subsequent re-registration - this is unrelated to 1.8.0.
I had arranged for lunch for the people who turned up in the morning. Of course, I should have known that a few extras would have turned up - but I didn't expect quite as many. Luckily we had enough food, just. I'll know better for next time!
The main workshop started in the afternoon where the concepts of SRM2.2 were presented in order to educate sites and give them the language that would be used during the rest of the workshop. Thanks to Flavia Donno and Maarten Litmaath describing the current status. We then moved onto short presentations from each of the storage developers, outlining the system and how SRM2.2 was implemented. Again, this helped to explain concepts to sites and enabled them to see the differences with the software that is currently deployed.
Stephen Burke gave a detailed presentation about v1.3 of the GLUE schema. This is important as it was introduced to allow for SRM2.2 specific information to be published in the information system. Sites need to be aware of what their SEs should be publishing - there should be a SAM test that will check this out.
The final session of the day was titled "What do experiments want from your storage?". I was hoping that we could get a real idea of how ATLAS, CMS and LHCb would want to use SRM2.2 at the sites, particularly Tier-2s. LHCb appear to have the clearest plan of what they want to do (although they don't want to use Tier-2 disk) as they were presenting exactly the data types and space tokens that they would like set up. CMS presented gave a variety of ideas for how they could use SRM2.2 to help with the data management. While these are not finalised I think they should form the basis for further discussion between all interested parties. For ATLAS, Graeme Stewart gave another good talk about their computing model and data management. Unfortunately, it was clear that ATLAS don't really have a plan for SRM2.2 at Tier-1s or 2s. He talked about ATLAS_TAPE and ATLAS_DISK space tokens, which is a start, but what is the difference between this and just having separate paths (atlas/disk and atlas/tape) in the SRM namespace? What was clear, however, was that ATLAS (and the other experiments) want storage to be available, accessible and reliable. This is essential for both WAN data transfers and local access for compute jobs and is really what the storage community have to focus on prior to the start of data taking - we are here for the physics after all!
So, Day 1 went well, that is until we had a power cut just after the final talk! To be honest though, it turned out to be a good thing as it meant people stopped working on their laptops and got out to the pub. This was another reason for having the workshop - allowing people to mix and match faces to the names that they have seen on the mailing lists. It all helps to foster that community spirit that we need to support each other,
OK, enough for now. I'll talk about Day 2 later.
16 November 2007
Storage workshops - the solution to (most of) our problems
While reviewing the outcomes of the workshop, I have come to the conclusion that we should have storage workshops more often as they clearly improve the reliability and availability of Grid storage. I took this snapshot of SAM test results for UK sites the day after the Edinburgh meeting and I think it's obvious that there is a lot of green about, although one site shall remain nameless...
Unfortunately, normal service appears to have resumed. So who is planning the next meeting?
14 November 2007
Dark storage?
The lights went out on the SRM2.2 deployment workshop yesterday evening, but it was timed to perfection as the final talk had just finished! Clearly my plan to get everyone out of the building worked really well ;)
Dinner in the George Hotel was excellent (as was the beer afterwards), I will post some photos from the restaurant and day-1 at some point today.
Dinner in the George Hotel was excellent (as was the beer afterwards), I will post some photos from the restaurant and day-1 at some point today.
10 November 2007
SRM2.2 deployment workshop - bulletin 5
View Larger Map
The workshop is almost upon us, I can't believe it's come round this fast! It was only a few weeks ago that I was proposing the idea of running this event. Preparations have been continuing apace and I now have the computer lab set up with 20 dCache and DPM test instances. Interested participants will use their laptops to connect to these machines during the tutorials and will configure the SRM2.2 service. There will be contributions from many sources during the workshop; the developers, experiment representatives and members of the WLCG GSSD are all putting together material. Thanks to everyone for all of their efforts!
The map above pinpoints some of the main locations that you will need to know about during the workshop. Hopefully you'll get the chance to see some more of Edinburgh's attractions while you are here.
Check Indico for the latest agenda:
http://indico.cern.ch/conferenceDisplay.py?confId=21405
Note that we have added a discussion session about the future of storage support. This is important given that (fingers crossed) we will be receiving physics data next year; we need to ensure the storage services have high availability and reliability.
See you next week.
02 November 2007
SRM2.2 deployment workshop - bulletin 4
We are now up to 55 registered participants! This is great - clearly a lot of people are interested in learning about SRM and the workshop presents a good opportunity for everyone to meet the middleware developers; find out about basics (and the complexities) of SRM2.2 and exchange hints/tips with site admins from all over WLCG.
I would ask everyone to come prepared for the workshop - have a think about your current storage setup and how this will be changing over the coming year. I would imagine everyone will be buying more disk so how do you want this set up? If you support many VOs, do you want to give dedicated disk to some of them while the others share?
Also, I've worked out some more of the technicalities of the hands on session. There are 20 desktop machines at NeSC which we can use for the tutorial and I have started setting up dCache/DPM images which will be rolled out across the cluster. I'll ask everyone to use their laptops to ssh onto the machines, but everyone will have to work in pairs; unfortunately there aren't hosts for one each. We'll be using a temporary CA for host and user certificates.
Finally, just to clarify that the workshop starts at 1pm on Tuesday 13th Nov. Make sure you pick up some lunch beforehand! You should be able to find plenty of eateries near to NeSC. Looking forward to see you all soon.
http://indico.cern.ch/conferenceDisplay.py?confId=21405
I would ask everyone to come prepared for the workshop - have a think about your current storage setup and how this will be changing over the coming year. I would imagine everyone will be buying more disk so how do you want this set up? If you support many VOs, do you want to give dedicated disk to some of them while the others share?
Also, I've worked out some more of the technicalities of the hands on session. There are 20 desktop machines at NeSC which we can use for the tutorial and I have started setting up dCache/DPM images which will be rolled out across the cluster. I'll ask everyone to use their laptops to ssh onto the machines, but everyone will have to work in pairs; unfortunately there aren't hosts for one each. We'll be using a temporary CA for host and user certificates.
Finally, just to clarify that the workshop starts at 1pm on Tuesday 13th Nov. Make sure you pick up some lunch beforehand! You should be able to find plenty of eateries near to NeSC. Looking forward to see you all soon.
http://indico.cern.ch/conferenceDisplay.py?confId=21405
25 October 2007
SRM2.2 deployment workshop - bulletin 3
Just thought I would give a quick update as to the organisation of the workshop. First of all, thanks to those who have already registered - we are fast approaching our limit and the registration page will be closed soon (so apply now if you still would like to come).
http://www.nesc.ac.uk/esi/events/827/
Secondly, I have changed the format of the Indico agenda page to use the "conference" format. Hopefully this makes more sense since we are having a couple of parallel sessions during the workshop which was difficult to see from the default layout.
Thirdly, I am working with people at NeSC to try and setup a hands on session where delegates get a chance to "play" with the SRM2.2 configuration on a dCache or DPM. At the moment there are still some technical "challenges" to overcome. I'll keep you posted.
As always, get in touch if there are any questions.
PS Can anyone name the tartan?
16 October 2007
SRM2.2 deployment workshop - bulletin 2
The registration page for the workshop is now available.
http://www.nesc.ac.uk/esi/events/827/
Could everyone who has pre-registered their interest to me please complete the registration form. If you require accommodation you should select the relevant nights and this will be booked for you by NeSC (they will not be paying for it though!). I have a list of names of those who have pre-registered, so I'll be chasing up anyone who does not fill out the form.
The agenda can be found here:
http://indico.cern.ch/conferenceDisplay.py?confId=21405
If you have any SRM or storage issues that you would like raised at the workshop, please let me know and I will try to ensure that it is covered at some stage.
Looking forward to seeing you in Edinburgh!
12 October 2007
OSG storage group
I was in a meeting yesterday with the OSG data storage and management group. I had realised that there was some activity going on in the US regarding this, but didn't know just how many people are involved, or what precisely they were doing. Turns out that there is actually quite a few FTEs looking at dCache 1.8 testing, client tools, deployment scripts, bug reports and support. There is clearly quite an overlap with what we try to do in GridPP, although there are a couple of differences:
1. GridPP has only me (and part of Jens) dedicated to storage. The rest comes from the community of site admins that have signed up to the mailing list. This works well, but it would be good if there were others who could dedicate more time to looking at storage issues. There were previously a couple of positions at RAL for this stuff. Hopefully there will be again in GridPP3.
2. Unlike GridPP, OSG is not dedicated to particle physics, hence they support other user communities. It appears that USATLAS and USCMS take on the lions share of support for storage at US Tier-2 sites.
You can find more information here:
https://twiki.grid.iu.edu/twiki/bin/view/Storage/
1. GridPP has only me (and part of Jens) dedicated to storage. The rest comes from the community of site admins that have signed up to the mailing list. This works well, but it would be good if there were others who could dedicate more time to looking at storage issues. There were previously a couple of positions at RAL for this stuff. Hopefully there will be again in GridPP3.
2. Unlike GridPP, OSG is not dedicated to particle physics, hence they support other user communities. It appears that USATLAS and USCMS take on the lions share of support for storage at US Tier-2 sites.
You can find more information here:
https://twiki.grid.iu.edu/twiki/bin/view/Storage/
06 October 2007
Interview with ZFS techies - mentions LHC
These guys (Jeff Bonwick and Bill Moore) are techies so not a sales pitch at all. Running time about 48 mins but you can do something else and just listen to it (like in a meeting :-)
http://www.podtech.net/scobleshow/technology/1619/talking-storage-systems-with-suns-zfs-team
Suitable for everyone, covers a lot of software engineering ("software is only as good as its test suite") but also obviously filesystems and "zee eff ess" and storage.
Also mentions CERN's LHC, and the LHC Atlas detector in particular.
And work done by CERN to examine silent data corruption (at around 33 mins into the interview).
http://www.podtech.net/scobleshow/technology/1619/talking-storage-systems-with-suns-zfs-team
Suitable for everyone, covers a lot of software engineering ("software is only as good as its test suite") but also obviously filesystems and "zee eff ess" and storage.
Also mentions CERN's LHC, and the LHC Atlas detector in particular.
And work done by CERN to examine silent data corruption (at around 33 mins into the interview).
Labels:
filesystems,
software engineering,
storage,
zfs
01 October 2007
SRM2.2 deployment workshop - bulletin 1
This is the first bulletin for the "SRM2.2 Deployment Workshop" which will take place on Tuesday the 13th and Wednesday the 14th of November 2007. The workshop is being organised by the UK's National eScience Centre (NeSC), GridPP and WLCG. It will take place at NeSC, Edinburgh, Scotland.
The goal of the workshop is to bring WLCG site administrators with experience of operating dCache and DPM storage middleware together with the core developers and other Grid storage experts. This will present a forum for information exchange regarding the deployment and operation of
SRM2.2 services on the WLCG. By the end of the workshop, site administrators will be fully aware of the technology that must be deployed in order to provide a service that fully meets the needs of the LHC physics programme.
Particular attention will be paid to the large number of sites who contribute small amounts of computing and storage resource (Tier-2s), as compared with national laboratories (Tier-1s). Configuration of SRM2.2 spaces and storage information publishing will be the main topics of
interest during the workshop. In addition, there will be a dedicated session for the Tier-1 sites running dCache.
As SRM2.2 has been proposed as an Open Grid Forum (OGF) standard, it is likely that the workshop will be of wider interest than only WLCG. However, it should be noted that the intention of the meeting is not to continue discussions of the current specification or of any future
version. The meeting will focus on the deployment of a production system.
It is intended that the meeting will be by invitation only. Anyone who thinks that they should receive an invitation should contact Greig Cowan to register their interest.
A detailed agenda and general information about the event are currently being prepared. A second announcement will be made once these are in place.
The goal of the workshop is to bring WLCG site administrators with experience of operating dCache and DPM storage middleware together with the core developers and other Grid storage experts. This will present a forum for information exchange regarding the deployment and operation of
SRM2.2 services on the WLCG. By the end of the workshop, site administrators will be fully aware of the technology that must be deployed in order to provide a service that fully meets the needs of the LHC physics programme.
Particular attention will be paid to the large number of sites who contribute small amounts of computing and storage resource (Tier-2s), as compared with national laboratories (Tier-1s). Configuration of SRM2.2 spaces and storage information publishing will be the main topics of
interest during the workshop. In addition, there will be a dedicated session for the Tier-1 sites running dCache.
As SRM2.2 has been proposed as an Open Grid Forum (OGF) standard, it is likely that the workshop will be of wider interest than only WLCG. However, it should be noted that the intention of the meeting is not to continue discussions of the current specification or of any future
version. The meeting will focus on the deployment of a production system.
It is intended that the meeting will be by invitation only. Anyone who thinks that they should receive an invitation should contact Greig Cowan to register their interest.
A detailed agenda and general information about the event are currently being prepared. A second announcement will be made once these are in place.
28 September 2007
Filesystems turn readonly
Some of the RAID'ed filesystems at Edinburgh recently decided that they would become readonly. This caused the dCache pool processes that depended on them to fall over (Repository got lost). Some of the affected filesystems were completely full, but not all of them. An unmount/mount cycle seems to have fixed things. Anyone seen this sort of thing before?
22 September 2007
SRM2.2 spaces at Tier-2s
One thing that is really worrying me is regarding the current deployment plan for SRM2.2. The idea is to get this stuff rolled out into production at all Tier-2s by the end of January. This is a difficult task when you think of the number of Tier-2s, all with their different configurations and experiments that need supporting. Oh, and it would also be good if we actually knew what the experiments want from SRM2.2 at Tier-2s. There needs to be a good bit more dialogue between them and the GSSD group to find out what spaces should be setup and how the disk should be separated. Or maybe they just don't care and want all the disk in a large block with a single space reservation made against it. One way or the other, it would be good to know.
dCache SRM2.2
There was a dedicated dCache session during CHEP for discussion between site admins and the developers in order to discover the latest developments and get help in the server configuration, which is *difficult*. Link groups and space reservations were flying about all over the place. More documentation is required, but this seems to be difficult when things are changing so fast (new options magically appear, or don't appear, in the dCacheSetup file). A training workshop would also be useful...
Shameless self promotion
OK, so obviously I've got nothing to do tonight other than talk about my contributions to CHEP. The presentation was on the possibility of using DPM in the distributed Tier-2 environment that we have within GridPP. We (Graeme Stewart and myself) used a custom RFIO client running on multiple nodes of the Glasgow CPU farm to read data from a DPM that was sitting in Edinburgh. It was surprisingly easy to do actually. You can find the slides in Indico.
Futuer investigations will use a dedicated, low-latency ligthpath rather than the production network.
We also had a poster at CHEP which was looking at the scalability of DPM when using RFIO access across the LAN. It used an identical method to the WAN paper, but in this case we were interested in really stressing the system and seeing how DPM scales are you add in more hardware. Summary: it performs very well and can easily support at least 100TB of disk. Check out Indico for details.
Futuer investigations will use a dedicated, low-latency ligthpath rather than the production network.
We also had a poster at CHEP which was looking at the scalability of DPM when using RFIO access across the LAN. It used an identical method to the WAN paper, but in this case we were interested in really stressing the system and seeing how DPM scales are you add in more hardware. Summary: it performs very well and can easily support at least 100TB of disk. Check out Indico for details.
DPM browser
Something I picked up from a DPM poster was the imminent release of an http/https browser for the DPM namespace. There weren't many details and the poster isn't online, but I think it claims to be using apache.
Full chain testing
The experiments want to test out the complete data management chain, from experiment pit->Tier-0->Tier-1->Tier-2. This exercise has been given the snappy title of Common Computing Readiness Challenge (CCRC) and is expected to run in a couple of phases in 2008: February, then May. This will be quite a big deal for everyone, and we need to make sure the storage is ready to cope with the demands. SRM2.2 is coming "real soon now" and should be fully deployed by the time of these tests (well it *had* better be...), which will make it the first large scale test of the new interface.
Storage is BIG!
It quickly became clear during CHEP that sites are rapidly scaling up their disk and tape systems in order to be ready for LHC turn on next year (first data in October...maybe). For instance, CNAF will soon have 3PB of disk, 1.5PB of tape along with 7.5MSpecInt. That is basically an order of magnitude larger than what they have currently. It was the same story elsewhere.
Kors was right, storage is hot. Fingers crossed that the storage middleware scales up to these levels.
Kors was right, storage is hot. Fingers crossed that the storage middleware scales up to these levels.
dCache update
I was speaking to Tigran from dCache during CHEP and got some new information about dCache and Chimera.
First off, ACLs are coming, but these are not tied to Chimera. They are implementing NFS4 ACLs, which are then mapped to POSIX, which (according to Tigran) makes them more like NT ACLs. Need to look into this further.
Secondly, the dCache guys are really pushing the NFS v4.1 definition as they see it as the answer to their local data access problems. 4.1 clients are being implemented in both Linux and Solaris (no more need to dcap libraries!). According to Tigran, NFS4.1 uses transactional operations. The spec doesn't detail the methods and return codes exactly. Rather, it defines a set of operations that can be combined into a larger operation. This sounds quite poweful, but how will the extra complexity lead to client-server interoperation?
Finally, one thing which I had realised about Chimera is that it allows you to modify the filesystem without actually mounting it. There is an API which can be used.
First off, ACLs are coming, but these are not tied to Chimera. They are implementing NFS4 ACLs, which are then mapped to POSIX, which (according to Tigran) makes them more like NT ACLs. Need to look into this further.
Secondly, the dCache guys are really pushing the NFS v4.1 definition as they see it as the answer to their local data access problems. 4.1 clients are being implemented in both Linux and Solaris (no more need to dcap libraries!). According to Tigran, NFS4.1 uses transactional operations. The spec doesn't detail the methods and return codes exactly. Rather, it defines a set of operations that can be combined into a larger operation. This sounds quite poweful, but how will the extra complexity lead to client-server interoperation?
Finally, one thing which I had realised about Chimera is that it allows you to modify the filesystem without actually mounting it. There is an API which can be used.
17 September 2007
"Storage is HOT!"*
* Kors Bos, CHEP'07 summary talk, slide 22.
Now that CHEP is finished (and I'm back from my holiday!) I thought it would be good to use the blog to reflect on what had been said, particularly when it comes to storage. In fact, between the WLCG workshop and CHEP itself, it is clear that storage is the hot topic on the Grid. This was made quite explicit by Kors during his summary talk. Over the next few days I'll post some tidbits that I picked up (once I've finished my conference papers, of course ;).
14 September 2007
Improved DPM GIP Plugin Now In Production
The improved DPM plugin, where the SQL query works far better for large DPMs and a number of minor bugs have been fixed, is now in production (should make the next gLite release).
See https://savannah.cern.ch/patch/?1254 for details.
Possibly this will be my last hurrah! with this fine script, before passing off to Greig to do the Glue 1.3 version, now with added space tokens....
See https://savannah.cern.ch/patch/?1254 for details.
Possibly this will be my last hurrah! with this fine script, before passing off to Greig to do the Glue 1.3 version, now with added space tokens....
23 August 2007
DPM vulnerability
Another security hole in the DPM gridftp server has been found and subsequently patched.
All details of the update can be found here.
The security advisory issued by the GSVG can be found here.
All DPM sites should use YAIM (or your method of choice) to upgrade to the latest version (DPM-gridftp-server-1.6.5-6) ASAP. Depending on how regularly you have been updating, there may also be new rpms available for other components of the DPM (all of these are on 1.6.5-5).
All details of the update can be found here.
The security advisory issued by the GSVG can be found here.
All DPM sites should use YAIM (or your method of choice) to upgrade to the latest version (DPM-gridftp-server-1.6.5-6) ASAP. Depending on how regularly you have been updating, there may also be new rpms available for other components of the DPM (all of these are on 1.6.5-5).
sgm and prod pool accounts
I've been a bit confused of late regarding what the best course of action is regarding how to deal with sgm and prod pool accounts on the SEs, in particular, dCache. As an example, Lancaster have run into the problem where a user with an atlassgm proxy has copied files into the dCache and has correspondingly been mapped to atlassgm:atlas (not atlassgm001 etc, just plain old sgm). Non-sgm users have then tried to remove these files from the dCache and have been denied since they are simple atlas001:atlas users. The default dCache file permissions do not allow group write access. This raises a few issues:
1. Why is atlassgm being used to write files into the dCache in the first place?
2. Why are non-sgm users trying to remove files that were placed into the dCache by a (presumably privileged) sgm user?
3. When will dCache have ACLs on the namespace to allow different groups of users access to a bunch of files?
The answer to the 3rd point is that ACLs will be available some time next year when we (finally) get the Chimera namespace replacement to PNFS. ACLs come as a plugin to Chimera.
The interim solution appears to be just to map all atlas users to atlas001:atlas, but this obviously doesn't help the security and traceability aspect that pool accounts are partially trying to solve. Since DPM supports namespace ACLs, we should be OK with supporting sgm and prod pool accounts. Of course, this requires that everyone has the appropriately configured ACLs, which isn't necessarily the case, as we've experienced before.
Comments welcome below.
1. Why is atlassgm being used to write files into the dCache in the first place?
2. Why are non-sgm users trying to remove files that were placed into the dCache by a (presumably privileged) sgm user?
3. When will dCache have ACLs on the namespace to allow different groups of users access to a bunch of files?
The answer to the 3rd point is that ACLs will be available some time next year when we (finally) get the Chimera namespace replacement to PNFS. ACLs come as a plugin to Chimera.
The interim solution appears to be just to map all atlas users to atlas001:atlas, but this obviously doesn't help the security and traceability aspect that pool accounts are partially trying to solve. Since DPM supports namespace ACLs, we should be OK with supporting sgm and prod pool accounts. Of course, this requires that everyone has the appropriately configured ACLs, which isn't necessarily the case, as we've experienced before.
Comments welcome below.
22 August 2007
Storage accounting - new and improved
We have made some improvements to the storage "accounting" portal (many thanks go to Dave Kant) during the past week or so. The new features are:
1. "Used storage per site" graphs are now generated (see image). This shows the breakdown of resources per site, which is good when looking at the ROC or Tier-2 views.
2. "Available storage per VO" graphs are generated in addition to the "Used" plots that we've always had. This comes with the usual caveats of available storage being shared among multiple VOs.
3. There is a Tier-2 hierarchical tree, so that you can easily pick out the Tier-2s of interest.
4. A few minor tweaks and bug fixes.
Current issues are in savannah.
The page is occasionally slow to load up as the server is also used by the GOC to provide RB monitoring of the production grid. Alternatives to improve speed are being looked at.
15 August 2007
CE-sft-lcg-rm-free released!
A new SAM test is now in production. It does a BDII lookup to check that there is sufficient space on the SE before attempting to run the standard replica management tests. This is good news for sites whose SEs fill up with important experiment data. If the tests finds that there is no free space, then the RM tests don't run. Of course this requires that the information being published into the BDII is correct in the first place. I'll need to check if this system could be abused by sites who publish 0 free space by default, thereby by-passing the RM tests and therefore any failures that could occur. I suppose that GStat already reports sites as being in a WARNING status when they have no free space.
See the related post here.
See the related post here.
14 August 2007
CLOSE_WAIT strikes again
Multiple DPM sites are reporting instabilities in the DPM service. The symptoms are massive resource usage by multiple dpm.ftpd processes on the disk servers (running the v1.6.5 of DPM). These have been forked by the main gridftp server process to deal with client requests. Digging a little further we find that the processes are responsible for many CLOSE_WAIT TCP connections between the DPM and the RAL FTS server. It also happens that all of the dpm.ftpd processes are owned by the atlassgm user, but I think this is only because ATLAS are the main (only?) VO using FTS to transfer data at the moment.
CLOSE_WAIT means that the local end of the connection has received a FIN from the other end, but the OS is waiting for the program at the local end to actually close its connection.
Durham, Cambridge, Brunel and Glasgow have all seen this effect. The problem is so bad at Durham that they have written a cron job that kills off the offending dpm.ftpd processes at regular intervals. Glasgow haven't been hit to badly, but then they do have 8GB of RAM on each of their 9 disk servers!
The DPM and FTS developers have been informed. From emails I have seen it appears that the DPM side is at fault, although the root cause is still not understood. This situation is very reminiscent of the CLOSE_WAIT issues that we were seeing with dCache at the end of last year.
Also see here.
DPM and xrootd
Following on from dCache, DPM is also developing an xrootd interface to the namespace. xrootd is the protocol (developed by SLAC) that provides POSIX access to their Scalla storage system, who's other component is the oldb clustering server.
DPM now has a usable xrootd interface. This will sit alongside the rfiod and gridftp servers. Currently, the server has some limitations (provided by A Peters at CERN):
* xrootd server runs as a single 'DPM' identity, all file reads+writes are done on behalf of this identity. However, it can be restricted to read-only mode.
* there is no support of certificate/proxy mapping
* every file open induces a delay of 1s as the interface is implemented as an asynchronous olbd Xmi plugin with polling.
On a short time scale the certificate support in xrootd will be fixed and VOMS roles added (currently certificate authentication is broken for certain CAs) . After that, the DPM interface can be simplified to use certificates/VOMs proxies & run as a simple xrootd OFS plugin without need for an olbd setup.
So it seems that xrootd is soon going to be available across the Grid. I'm sure that ALICE (and maybe some others...) will be very interested.
DPM now has a usable xrootd interface. This will sit alongside the rfiod and gridftp servers. Currently, the server has some limitations (provided by A Peters at CERN):
* xrootd server runs as a single 'DPM' identity, all file reads+writes are done on behalf of this identity. However, it can be restricted to read-only mode.
* there is no support of certificate/proxy mapping
* every file open induces a delay of 1s as the interface is implemented as an asynchronous olbd Xmi plugin with polling.
On a short time scale the certificate support in xrootd will be fixed and VOMS roles added (currently certificate authentication is broken for certain CAs) . After that, the DPM interface can be simplified to use certificates/VOMs proxies & run as a simple xrootd OFS plugin without need for an olbd setup.
So it seems that xrootd is soon going to be available across the Grid. I'm sure that ALICE (and maybe some others...) will be very interested.
06 August 2007
dcache on SL4
As part of our planned upgrade to SL4 at Manchester, we've been looking at getting dcache running.
The biggest stumbling block is a lack of glite-SE_dcache* profile, luckily it seems that all of the needed components apart from dcache-server are in the glite-WN profile. Even the GSIFtp Door appears to work.
The biggest stumbling block is a lack of glite-SE_dcache* profile, luckily it seems that all of the needed components apart from dcache-server are in the glite-WN profile. Even the GSIFtp Door appears to work.
05 August 2007
SRMv2.2 directory creation
Just discovered that automatic directory creation doesn't happen with SRMv2.2. Directories are created when using SRMv1.
02 August 2007
Annoyed
I re-ran YAIM yesterday on the test DPM I've got at Edinburgh as it turned out we were not publishing the correct site name. Annoyingly, this completely broke information publishing as the BDII couldn't find the correct schema files (again). I had to re-create the symbolic link from /opt/glue/schemas/ldap to /opt/glue/schemas/openldap2.0 and then double check that all was well with the /opt/bdii/etc/schemas files. A restart of the BDII then sorted things out.
It's not really fair to blame YAIM here since I'm running the SL3 build of DPM on SL4, which isn't really supported. Well, I'm hoping that's the source of the trouble.
It's not really fair to blame YAIM here since I'm running the SL3 build of DPM on SL4, which isn't really supported. Well, I'm hoping that's the source of the trouble.
01 August 2007
Non-improvement of SAM tests
For a while I have been pushing for the creation of a SAM test that only probes the SRM and does not depend on any higher level services (like the LFC or BDII). This would be good as it would prevent sites being marked as unavailable when in fact their SRM is up and running.
Unfortunately, the SAM people have decided to postpone the creation of a pure-SRM test. I don't really understand their concerns. I thought using srmcp with a static (but nightly updated) list of SRM endpoints would have been sufficient. I guess they have some reservations about using the FNAL srmcp client, since it isn't lcg-utils/GFAL, which are the official storage access methods.
https://savannah.cern.ch/bugs/?25249
Unfortunately, the SAM people have decided to postpone the creation of a pure-SRM test. I don't really understand their concerns. I thought using srmcp with a static (but nightly updated) list of SRM endpoints would have been sufficient. I guess they have some reservations about using the FNAL srmcp client, since it isn't lcg-utils/GFAL, which are the official storage access methods.
https://savannah.cern.ch/bugs/?25249
31 July 2007
Improvement to SAM replica management tests
The SAM people have added a new test that checks if the default SE has any free space left. This
test will not be critical by default, however its failure will cause the actual replica management tests not to execute at all. This is good news, as a full SE (in my opinion) is not really a site problem and not a reason that the site should fail a SAM test.
The full bug can be found here:
http://savannah.cern.ch/bugs/?26046
Greig
test will not be critical by default, however its failure will cause the actual replica management tests not to execute at all. This is good news, as a full SE (in my opinion) is not really a site problem and not a reason that the site should fail a SAM test.
The full bug can be found here:
http://savannah.cern.ch/bugs/?26046
Greig
SAM failures, again
Looks like something changed inside SAM (again) yesterday, causing a large number of sites to fail the CE replica management tests with a "permission denied" error.
Further investigation shows that the failed tests were being run by someone with a DN from Cyfronet.
By default, this is mapped to the ops group in the grid map file, not opssgm like Judit Novak and Piotr Nyczyk. It is clear from the Glasgow DPM logs that this DN does not belong to ops/Role=lcgadmin. This then leads to failures in DPMs due to the fact that the dpm/domain/home/ops/generated/ directories have ACLs on them which only grant write permissions to people in ops/Role=lcgadmin.
Looks like things have been rectified now.
Why do we keep on getting hit by things like this?
Further investigation shows that the failed tests were being run by someone with a DN from Cyfronet.
By default, this is mapped to the ops group in the grid map file, not opssgm like Judit Novak and Piotr Nyczyk. It is clear from the Glasgow DPM logs that this DN does not belong to ops/Role=lcgadmin. This then leads to failures in DPMs due to the fact that the dpm/domain/home/ops/generated/ directories have ACLs on them which only grant write permissions to people in ops/Role=lcgadmin.
Looks like things have been rectified now.
Why do we keep on getting hit by things like this?
27 July 2007
SRM and SRB interoperability - at last!
People have been talking for years about getting SRM and SRB "interoperable", mostly involving building complicated interfaces from SRX to SRY in various ways.
Now it turns out SRB has a GridFTP interface, developed by Argonne. So here's the idea: why don't we pretend the SRB is a Classic SE?
So we can now transfer files with gridftp (i.e. globus-url-copy) from dCache to SRB and vice versa, although the disadvantage is that you have to know the name of a pool node with a GridFTP door. Incidentally, if you try that, don't forget -nodcau or it won't work (for GridFTP 3rd party copying).
But here's the brilliant thing: it also works with FTS, since FTS still supports Classic SEs. So we have successfully transferred data between dCache, the SRM, and SRB as a, well, GridFTP server, and back again.
Cool, eh?
Next step is to set up a Classic SE-shaped information system for SRB and see if it works with lcg-utils and GFAL (because FTS does not depend on the SE having a GRIS).
This is work with Matt Hodges at Tier 1 who set up FTS, and with Roger Downing and Adil Hasan from STFC for the SRB.
--jens
Now it turns out SRB has a GridFTP interface, developed by Argonne. So here's the idea: why don't we pretend the SRB is a Classic SE?
So we can now transfer files with gridftp (i.e. globus-url-copy) from dCache to SRB and vice versa, although the disadvantage is that you have to know the name of a pool node with a GridFTP door. Incidentally, if you try that, don't forget -nodcau or it won't work (for GridFTP 3rd party copying).
But here's the brilliant thing: it also works with FTS, since FTS still supports Classic SEs. So we have successfully transferred data between dCache, the SRM, and SRB as a, well, GridFTP server, and back again.
Cool, eh?
Next step is to set up a Classic SE-shaped information system for SRB and see if it works with lcg-utils and GFAL (because FTS does not depend on the SE having a GRIS).
This is work with Matt Hodges at Tier 1 who set up FTS, and with Roger Downing and Adil Hasan from STFC for the SRB.
--jens
20 July 2007
Slightly Modified DPM GIP Plugin
The last version of the DPM GIP plugin had a few minor bugs:
Now submitted as a patch in Savannah.
- The "--si" flag had been lost somewhere.
- DNS style VOs were not handled properly (e.g., supernemo.vo.eu-egee.org).
Now submitted as a patch in Savannah.
17 July 2007
Optimised DPM GIP Plugin
Lana Abadie noticed that my DPM GIP plugin was rather inefficient (table joins are expensive!), and sent a couple of options for improving the SQL query. I implemented them in the plugin and it speeded up by a factor of 10.
I have produced a new RPM with the optimised query, which is available here: http://www.physics.gla.ac.uk/~graeme/scripts/packages/lcg-info-dynamic-dpm-2.2-1.noarch.rpm.
I am running this already at Glasgow and I would recommend it for anyone with a large DPM.
N.B. It is compatible with DPM 1.6.3 and 1.6.5, but remember to modify the /opt/lcg/var/gip/plugin/lcg-info-dynamic-se wrapper to run /opt/lcg/libexec/lcg-info-dynamic-dpm instead of /opt/lcg/libexec/lcg-info-dynamic-dpm-beta.
I have produced a new RPM with the optimised query, which is available here: http://www.physics.gla.ac.uk/~graeme/scripts/packages/lcg-info-dynamic-dpm-2.2-1.noarch.rpm.
I am running this already at Glasgow and I would recommend it for anyone with a large DPM.
N.B. It is compatible with DPM 1.6.3 and 1.6.5, but remember to modify the /opt/lcg/var/gip/plugin/lcg-info-dynamic-se wrapper to run /opt/lcg/libexec/lcg-info-dynamic-dpm instead of /opt/lcg/libexec/lcg-info-dynamic-dpm-beta.
SRM2.2 storage workshop
There was a storage workshop held at CERN on the 2nd and 3rd of July. The focus of discussions was on the SRM2.2 developments and testing of the endpoints. The majority of the endpoints are being published in the PPS, the intention being that the experiments will be able to use them in a ~production environment and allow some real stress tests to be run against them. The experiments see SRM2.2 as being an essential service for them, so hopefully they have sufficient manpower to run the tests...
Getting the software installed on the machines isn't a problem, but getting it configured can be tricky. The main point that I tried to highlight on a number of occasions was the necessity for sites to have really good documentation from both the developers (how the SRM2.2 spaces can be configued) and the experiments (how the SRM2.2 spaces should be configured for their needs). I will make sure that I provide instructions for everyone to ensure that the deployment goes (relatively) smoothly. It shouldn't be too much of a problem for DPM sites, dCache sites will need to start playing around with link groups ;-)
From mid-October, sites should be thinking of having these SRM2.2 spaces configured. The plan is that by January 2008, everyone will have this functionality available, and SRM2.2 will become the default interface.
Getting the software installed on the machines isn't a problem, but getting it configured can be tricky. The main point that I tried to highlight on a number of occasions was the necessity for sites to have really good documentation from both the developers (how the SRM2.2 spaces can be configued) and the experiments (how the SRM2.2 spaces should be configured for their needs). I will make sure that I provide instructions for everyone to ensure that the deployment goes (relatively) smoothly. It shouldn't be too much of a problem for DPM sites, dCache sites will need to start playing around with link groups ;-)
From mid-October, sites should be thinking of having these SRM2.2 spaces configured. The plan is that by January 2008, everyone will have this functionality available, and SRM2.2 will become the default interface.
DPM gridftp security
Apologies for not posting for a while, it's been a busy few weeks. First thing that should be mentioned is the gaping security hole that existed in the DPM gridftp server. Users using the uberftp (or some other suitable) client could log into the server and change permissions on anyones files, move files to different areas of the DPM namespace or even move files outside of the namespace altogether. Thanks to Kostas and Olivier at Imperial for spotting this. Unfortunately, it took a couple of weeks, 3 patch releases and a lot of testing within GridPP before we finally plugged the hole.
Initially only patched version of the 1.6.5 server was produced. I asked for the fix to be back-ported to 1.5.10 as there were a few sites still running this version, unable to upgrade to the latest release (due to the upgrade problems) due to ongoing experiment tests and wanting to be as secure as they could be. This was done, so thanks to the DPM team.
All sites should upgrade to the latest version of DPM and ensure that they are running patch -4 of the gridftp server.
Initially only patched version of the 1.6.5 server was produced. I asked for the fix to be back-ported to 1.5.10 as there were a few sites still running this version, unable to upgrade to the latest release (due to the upgrade problems) due to ongoing experiment tests and wanting to be as secure as they could be. This was done, so thanks to the DPM team.
All sites should upgrade to the latest version of DPM and ensure that they are running patch -4 of the gridftp server.
26 June 2007
DPM 1.6.5-1 in PPS
v1.6.5-1 of DPM is now in pre-production. The relevant savannah page is here:
https://savannah.cern.ch/patch/index.php?1179
This release involves various bug fixes. What is interesting is that it will now be possible to set ACLs on DPM pools, rather than just limiting a pool to either a single VO or all VOs. This should make sites happy. The previous posting on this version of DPM mentioned that gridftpv2 would be used, but the release notes don't mention this, so we will have to wait and see.
Also out in PPS is the use of v1.3 of the GLUE schema. This is really good news since GLUE 1.3 will allow SEs to properly publish information about SRM2.2 storage spaces (i.e. Edinburgh has 3TB of ATLAS_AOD space).
https://savannah.cern.ch/patch/index.php?980
https://savannah.cern.ch/patch/index.php?1179
This release involves various bug fixes. What is interesting is that it will now be possible to set ACLs on DPM pools, rather than just limiting a pool to either a single VO or all VOs. This should make sites happy. The previous posting on this version of DPM mentioned that gridftpv2 would be used, but the release notes don't mention this, so we will have to wait and see.
Also out in PPS is the use of v1.3 of the GLUE schema. This is really good news since GLUE 1.3 will allow SEs to properly publish information about SRM2.2 storage spaces (i.e. Edinburgh has 3TB of ATLAS_AOD space).
https://savannah.cern.ch/patch/index.php?980
21 June 2007
DPM 1.5.10 -> 1.6.4 upgrade path broken in YAIM
As reported at yesterdays storage meeting, the upgrade path from DPM 1.5.10 to 1.6.4 is broken in YAIM 3.0.1-15. The different versions of DPM require database schema upgrades in order to be able to handle all of the SRM2.2 stuff (space reservation etc). YAIM should contain appropriate scripts to perform these upgrades, but it appears that they appropriate code has been removed, meaning that it is no longer possible to move from schema versions 2.[12].0 in v1.5.10 of DPM to schemas 3.[01].0 in v1.6.4. We stumbled upon this bug when I asked Cambridge to upgrade to the latest DPM in an attempt to resolve the intermittent SAM failures that they were experiencing. A fairly detailed report of what was required to solve the problem can be found in this ticket:
https://gus.fzk.de/pages/ticket_details.php?ticket=23569
It should be noted that for some reason (a bug in a YAIM script?) the Cambridge DPM was missing two tables from the dpm_db database. These were dpm_fs and dpm_getfilereq (I think). This severely hindered the upgrade since we were trying to upgrade the schema, which was successful, but then the DPM wouldn't start. A restore of the database backup, then an upgrade to DPM 1.6.3 then onto DPM (I'm keeping a close eye on the SAM tests...). Sites should be aware that they may need to follow the steps detailed in this link while performing the database upgrade.
https://twiki.cern.ch/twiki/bin/view/LCG/DpmSrmv2Support
After the installation, the srmv2.2 daemon was running and the SRM2.2 information was being published by the BDII. This is all good. If you end up using yaim 3.0.1-16, it should not be necessary to manually install the host certificates for the edguser.
In summary, the 1.5.10 to 1.6.4 upgrade was a lot of work. Thanks to Santanu for giving me access to the machine. This problem raises issues about sites keeping up to date with the latest releases of middleware. Although there were problems with the configuration of 1.6.4, v1.6.3 has been stable in production for a while now. I'm not really sure why some sites hadn't upgraded to that. It would be great if every site could publish the version of the middleware that they are using. In fact, such a feature may be coming very soon. Just watch this space.
https://gus.fzk.de/pages/ticket_details.php?ticket=23569
It should be noted that for some reason (a bug in a YAIM script?) the Cambridge DPM was missing two tables from the dpm_db database. These were dpm_fs and dpm_getfilereq (I think). This severely hindered the upgrade since we were trying to upgrade the schema, which was successful, but then the DPM wouldn't start. A restore of the database backup, then an upgrade to DPM 1.6.3 then onto DPM (I'm keeping a close eye on the SAM tests...). Sites should be aware that they may need to follow the steps detailed in this link while performing the database upgrade.
https://twiki.cern.ch/twiki/bin/view/LCG/DpmSrmv2Support
After the installation, the srmv2.2 daemon was running and the SRM2.2 information was being published by the BDII. This is all good. If you end up using yaim 3.0.1-16, it should not be necessary to manually install the host certificates for the edguser.
In summary, the 1.5.10 to 1.6.4 upgrade was a lot of work. Thanks to Santanu for giving me access to the machine. This problem raises issues about sites keeping up to date with the latest releases of middleware. Although there were problems with the configuration of 1.6.4, v1.6.3 has been stable in production for a while now. I'm not really sure why some sites hadn't upgraded to that. It would be great if every site could publish the version of the middleware that they are using. In fact, such a feature may be coming very soon. Just watch this space.
08 June 2007
Anyone for a DPM filesystem?
Looks like someone at CERN is developing a mechanism to enable DPM servers to be mounted. This DPMfs could be used as a simple DPM browser, presenting the namespace in a more user-friendly form than the DM command line utilities. The DPM fs is implemented using the FUSE kernel module interface. The file system calls are forwarded to the daemon which communicates with the DPM servers using the
rfio
and dpns
API and sends back the answer to the kernel.It's in development and not officially supported:
https://twiki.cern.ch/twiki/bin/view/LCG/DPMfs
06 June 2007
DPM 1.6.5 Coming...
It's not here yet, but DPM 1.6.5 has been tagged for release as part of gLite 3.1. A list of goodies with this release are:
The gridftp v2 server looks to be rather an interesting development.
The patch has all the details.
- remove expired spaces
- avoid crash in dpm_errmsg/Cns_errmsg when supplied
buffer is too small (GGUS ticket 21767)
- correct processing of rfio_access on DPM TURLs
(Atlas)
- return DPM version in otherInfo field of srmPing
response
- dpm-shutdown: take "server" into account
- add methods ping and getifcevers in LFC/DPM
- fixed bug #25830: add ACLs on disk pools
- dpm-qryconf: add option --group to display
groupnames instead of gids
- dpm-qryconf: add option --proto to display
supported protocols
- fixed bug #25810: dpm-qryconf: add option --si
to display sizes in power of 10
- implement recursive srmLs and srmRmdir
- DPM-DSI plug-in for the GT4 gridftp2 server
The gridftp v2 server looks to be rather an interesting development.
The patch has all the details.
SAM test failures explained
Here's the story: The past couple of weeks have been pretty bad for SAM. There have been at least 3 big problems with the service due to backend database issues, moving to new hardware, etc. In amongst all of this, the certificate of the user who runs the SAM test ran out (I don't know what happened to the CA warning a month before). It was decided to implement a quick fix by using a different users certificate to submit the test. This was OK for a while, until the ops replica management tests then tried to create a new ops/generated/YYYY-MM-DD directory early on Saturday morning. This was fine for dCache sites, but DPM sites suffered due to the DPM not mapping the new certificate DN + VOMs attributes to a virtual gid that would give permission to create these generated directories. This was the source of the "permission denied" errors that were being reported by lcg-cr. Once sites updated the ACLs on the ops/generated directories, the new certificate DN + VOMs attributes had authorisation to write a new directory and the tests started to pass again.
As an aside, the initially errors pointed to a permissions problem on the LFC, but this was a red herring. This is another example of the poor error messages that are reported by grid middleware.
As an aside, the initially errors pointed to a permissions problem on the LFC, but this was a red herring. This is another example of the poor error messages that are reported by grid middleware.
04 June 2007
DPM sites failing SAM due to change in ops VOMs role
The SAM people changed the VOMs role of the certificate being used to run the ops SAM tests. This led to the majority of DPM sites on the grid failing the replica management tests on over the weekend. Why they made this change (with no announcement) on a Friday is unknown. Graeme's got some information here:
http://scotgrid.blogspot.com/2007/06/sam-tests-changed-voms-role-without.html
All UK DPM sites were failing with the exception of RHUL and Brunel (well done Duncan). All of these sites should run the script that was posted to LCG-ROLLOUT as this will alter the ACLs on the generated directories appropriately.
The other annoying thing is that this wouldn't have happened if all sites were running DPM 1.6.4 (which supports secondary groups). The problem is that this release is broken (due to 2 different problems) meaning that no one is running it!
http://scotgrid.blogspot.com/2007/06/sam-tests-changed-voms-role-without.html
All UK DPM sites were failing with the exception of RHUL and Brunel (well done Duncan). All of these sites should run the script that was posted to LCG-ROLLOUT as this will alter the ACLs on the generated directories appropriately.
The other annoying thing is that this wouldn't have happened if all sites were running DPM 1.6.4 (which supports secondary groups). The problem is that this release is broken (due to 2 different problems) meaning that no one is running it!
29 May 2007
Update on the classic SE
As reported at last weeks WLCG operations meeting, the classic SE is now completely frozen (not that it affects any GridPP sites since we are all SRM-ified). There are a few points to note about this:
Are there things not in the classic SE that people want added? Any such features should already be in DPM so this is an acceptable upgrade.
Are there features in the classic SE that are not available in DPM or dCache? The obvious answer is real POSIX mounting of the file system probably via NFS. Both DPM and dCache are looking at supporting NFSv4, there are comments from both of these that NFSv4 might be available by end of the year or sooner.
Will the classic SE be included in the upcoming gLite release? The answer is yes, it will appear in gLite 3.1. Once there is a DPM with NFSv4 support then this will be re-evaluated.
Are there things not in the classic SE that people want added? Any such features should already be in DPM so this is an acceptable upgrade.
Are there features in the classic SE that are not available in DPM or dCache? The obvious answer is real POSIX mounting of the file system probably via NFS. Both DPM and dCache are looking at supporting NFSv4, there are comments from both of these that NFSv4 might be available by end of the year or sooner.
Will the classic SE be included in the upcoming gLite release? The answer is yes, it will appear in gLite 3.1. Once there is a DPM with NFSv4 support then this will be re-evaluated.
Storage security service challenge
WLCG are asking each ROC to run a security service challenge their sites. Someone (probably Alessandra) will submit a job to each site which will attempt to write a file to the local SE, read it back, copy it to a remote SE, delete the file... Once complete, the submitter will issue a GGUS ticket against the site, asking them to provide information on which operations were performed on the file. You can see an example of what is expected here:
https://gus.fzk.de/pages/ticket_details.php?ticket=22012
The aim of this testing is to determine if SEs record sufficient information for tracing user operations and also to check that site admins are able to gather that information. I am currently putting together some scripts that will perform the querying and parsing of DPM/dCache databases and log files in order to gather the information.
In addition to going through the SE files, it is likely that sites will have to parse the PBS logs on the CE to determine the UI that was originally used for the job submission.
https://gus.fzk.de/pages/ticket_details.php?ticket=22012
The aim of this testing is to determine if SEs record sufficient information for tracing user operations and also to check that site admins are able to gather that information. I am currently putting together some scripts that will perform the querying and parsing of DPM/dCache databases and log files in order to gather the information.
In addition to going through the SE files, it is likely that sites will have to parse the PBS logs on the CE to determine the UI that was originally used for the job submission.
22 May 2007
DPM 1.6.4 and SL4
I have just come across a problem with running DPM v1.6.4 on SL4. Well, it's not actually a problem with DPM itself, but rather a problem with the BDII that it is now using as an information provider. SL4 comes with openldap v2.2 which appears to have stricter schema checking than openldap v2.0 (which comes with SL3). This causes problems like this:
$ ldapsearch -LLL -x -H ldap://wn4.epcc.ed.ac.uk:2170 -b mds-vo-name=resource,o=grid
Invalid DN syntax (34)
Additional information: invalid DN
Meaning that your SE can't publish anything about itself. This can be resolved by adding this block of code
attributetype ( 1.3.6.1.4.1.3536.2.6.1.4.0.1
NAME 'Mds-Vo-name'
DESC 'Locally unique VO name'
EQUALITY caseIgnoreMatch
ORDERING caseIgnoreOrderingMatch
SUBSTR caseIgnoreSubstringsMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.44
SINGLE-VALUE
)
to /opt/glue/schema/ldap/Glue-CORE.schema and then restarting the ldap and bdii processes. This is covered by this bug:
https://savannah.cern.ch/bugs/index.php?15532
$ ldapsearch -LLL -x -H ldap://wn4.epcc.ed.ac.uk:2170 -b mds-vo-name=resource,o=grid
Invalid DN syntax (34)
Additional information: invalid DN
Meaning that your SE can't publish anything about itself. This can be resolved by adding this block of code
attributetype ( 1.3.6.1.4.1.3536.2.6.1.4.0.1
NAME 'Mds-Vo-name'
DESC 'Locally unique VO name'
EQUALITY caseIgnoreMatch
ORDERING caseIgnoreOrderingMatch
SUBSTR caseIgnoreSubstringsMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.44
SINGLE-VALUE
)
to /opt/glue/schema/ldap/Glue-CORE.schema and then restarting the ldap and bdii processes. This is covered by this bug:
https://savannah.cern.ch/bugs/index.php?15532
17 May 2007
DPM 1.6.4 released (with a few problems)
DPM v1.6.4 was released into production this week. First of all, there are a few points to be aware of:
1. This release requires an update of the v1.6.3 DB schema. **YAIM will take care of this for you**. It is not necessary to run the DB migration script by hand.
2. Two new YAIM variables, DPM_DB and DPNS_DB, are introduced.
3. After the reconfiguration, DPM will use the BDII as an information provider instead of Globus MDS. By default the BDII runs on port 2170 whereas globus-mds was on 2135. You need to change the site-info.def variable to this (so that the site BDII looks in the right place)
BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo-name=resource,o=grid"
4. YAIM does some tweaking of the /etc/sysctl.conf values. The old values are copied to /etc/sysctl.conf.orig if you want to reinstate them.
However, once the release was announced, a couple of problems soon reared their heads:
a) Sites were recommended not to upgrade due to problem left over from the build
http://glite.web.cern.ch/glite/packages/R3.0/updates.asp
For sites who had already upgraded, the fix was this:
The fix was to perform these steps manually:
mkdir -p ~edguser/.globus
chown edguser:edguser ~edguser/.globus
cp /etc/grid-security/hostcert.pem ~edguser/.globus/usercert.pem
cp /etc/grid-security/hostkey.pem ~edguser/.globus/userkey.pem
chown edguser:edguser /home/edguser/.globus/user*
chmod 400 /home/edguser/.globus/userkey.pem
Obviously the certification testing isn't quite as water-tight as we would hope.
1. This release requires an update of the v1.6.3 DB schema. **YAIM will take care of this for you**. It is not necessary to run the DB migration script by hand.
2. Two new YAIM variables, DPM_DB and DPNS_DB, are introduced.
3. After the reconfiguration, DPM will use the BDII as an information provider instead of Globus MDS. By default the BDII runs on port 2170 whereas globus-mds was on 2135. You need to change the site-info.def variable to this (so that the site BDII looks in the right place)
BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo-name=resource,o=grid"
4. YAIM does some tweaking of the /etc/sysctl.conf values. The old values are copied to /etc/sysctl.conf.orig if you want to reinstate them.
However, once the release was announced, a couple of problems soon reared their heads:
a) Sites were recommended not to upgrade due to problem left over from the build
http://glite.web.cern.ch/glite/packages/R3.0/updates.asp
For sites who had already upgraded, the fix was this:
mkdir -p /home/glbuild/GLITE_3_0_3_RC1_DATA/stage/etcb) With the latest update the info provider of the DPM machines has changed from MDS to BDII. However the YAIM ( -15) coming with the update does not configures edguser's certificate.
ln -s /opt/lcg/etc/lcgdm-mapfile \
/home/glbuild/GLITE_3_0_3_RC1_DATA/stage/etc
The fix was to perform these steps manually:
mkdir -p ~edguser/.globus
chown edguser:edguser ~edguser/.globus
cp /etc/grid-security/hostcert.pem ~edguser/.globus/usercert.pem
cp /etc/grid-security/hostkey.pem ~edguser/.globus/userkey.pem
chown edguser:edguser /home/edguser/.globus/user*
chmod 400 /home/edguser/.globus/userkey.pem
Obviously the certification testing isn't quite as water-tight as we would hope.
10 May 2007
Manchester Tier2 dcache goes resilient II
Yesterday we completed the scheduled downtime, and now dcache02 is up and resilient, it's still chewing through the list of files and making copies of them, going by past experience it will probably finish somewhere around lunchtime tomorrow. It's so nice to know we're not in the dark ages of dcache-1.6.6 any more. Of course, there's still small niggles to iron out and we've yet to really throw a big load at it, but it's looking a lot, lot better.
09 May 2007
DPM 1.6.4-3 on PPS
DPM v1.6.4-3 is now available on the PPS. I would imagine that it will move into production in the next couple of weeks. This version requires a schema change to the dpm_db (3.0.0 -> 3.1.0). YAIM will take of this for you, although a DB backup is recommended beforehand.
We have now moved to YAIM 3.0.1-15, so the installation and configuration steps now look like:
$ /opt/glite/bin/yaim -i -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
$ /opt/glite/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n SE_dpm_mysql
https://savannah.cern.ch/patch/index.php?1121
The information provider plugin is still the *old* one (which does not account for the used space properly). Therefore you will need to install Graeme's new one by hand (again).
http://www.gridpp.ac.uk/wiki/DPM_Information_Publishing#Beta_Release_Plugin
With this version of DPM there is a BDII process on port 2170 that is used to provide the information about the DPM. This replaces globus-mds as the information provider which ran on port 2135.
This version of YAIM includes the some /etc/sysctl.conf tweaks in the config_DPM_disk function. This is nice (since it could lead to some optimisations) but I think sites should be warned about this beforehand and be allowed to turn off these changes:
We have now moved to YAIM 3.0.1-15, so the installation and configuration steps now look like:
$ /opt/glite/bin/yaim -i -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
$ /opt/glite/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n SE_dpm_mysql
https://savannah.cern.ch/patch/index.php?1121
The information provider plugin is still the *old* one (which does not account for the used space properly). Therefore you will need to install Graeme's new one by hand (again).
http://www.gridpp.ac.uk/wiki/DPM_Information_Publishing#Beta_Release_Plugin
With this version of DPM there is a BDII process on port 2170 that is used to provide the information about the DPM. This replaces globus-mds as the information provider which ran on port 2135.
This version of YAIM includes the some /etc/sysctl.conf tweaks in the config_DPM_disk function. This is nice (since it could lead to some optimisations) but I think sites should be warned about this beforehand and be allowed to turn off these changes:
https://gus.fzk.de/pages/ticket_details.php?ticket=21713
Anyway, I upgraded from v1.6.3 to v1.6.4 today (on SL4 32bit). No problems so far, but I will let you know if anything comes up.
ZFS performance on RAID
http://milek.blogspot.com/2007/04/hw-raid-vs-zfs-software-raid-part-iii.html
04 May 2007
Manchester Tier2 dcache goes resilient
We're half way through the combined upgrade from dcache-1.6.6-vanilla to dcache1.7.0-with-replica-manager, so far only one of the two head-nodes has been upgraded, but so far so good, the other is scheduled for upgrade next week, and I appear to have scheduled the queue shutdown at 8am on bank-holiday Monday! Documentation will obviously follow including cfengine snippets for those people that love it.
01 May 2007
Video of DPM and SRM Presentations at HEPiX
Starring Mr Steve Traylen, including Video download! See Steve's Blog for the links.
25 April 2007
Storage talks at HEPiX
There are lots of storage related presentations at HEPiX today:
https://indico.desy.de/conferenceTimeTable.py?confId=257&showDate=25-April-2007&showSession=all&detailLevel=contribution&viewMode=parallel
Of particular relevance to WLCG are the presentations on dCache, DPM and SRM. There are also talks about data corruption and distributed filesystems (GPFS, Lustre,...).
Should make for some interesting reading.
https://indico.desy.de/conferenceTimeTable.py?confId=257&showDate=25-April-2007&showSession=all&detailLevel=contribution&viewMode=parallel
Of particular relevance to WLCG are the presentations on dCache, DPM and SRM. There are also talks about data corruption and distributed filesystems (GPFS, Lustre,...).
Should make for some interesting reading.
dCache, DPM and SRM2.2
As most of you know, the LCG experiments are requiring that all storage be accessible via the SRM2.2 interface. The current version of dCache, v1.7.0, only provides the SRM1 interface (and an incomplete SRM2.2). Full SRM2.2 support will only be available in the v1.8.0 branch of dCache. Once v1.8.0 has been fully tested and moves into production, all GridPP dCache sites will have to upgrade.
As I understand the situation, no upgrade path between v1.7.0 and v1.8.0 is planned. Sites will first have to upgrade to v1.7.1 and then move onto v1.8.0. The plan is such that v1.7.1 will contain the same code as v1.8.0, minus the SRM2.2 stuff.
Obviously all dCache sites will want to ensure that there is a stable version of the system, particularly as all sites now have 10's of TBs of experiment data on disk. The SRM2.2 bits of v1.8.0 are currently being tested. Once v1.7.1 is released we can test out the upgrade path before giving the nod to sites. There will be some additional complexity when it comes to setting up the space reservation parts of SRM2.2 in your dCache. Documentation will be available when sites have to perform this additional configuration step. In fact, all of this configuration may go into YAIM.
The situation for DPM is slightly simpler. Full SRM2.2 support exists in v1.6.4 which is going through certification at the moment (1.6.3 is the current production version). Again, there will be some additional complexity in configuring the SRM2.2 spaces, but this will be documented.
Even once the SRM2.2 endpoints are available, it is likely that the SRM1 endpoint (running simultaneously on the same host) will continue to be used by the experiments until SRM2.2 becomes widely deployed and the client tools start using it as the default interface for file access.
As I understand the situation, no upgrade path between v1.7.0 and v1.8.0 is planned. Sites will first have to upgrade to v1.7.1 and then move onto v1.8.0. The plan is such that v1.7.1 will contain the same code as v1.8.0, minus the SRM2.2 stuff.
Obviously all dCache sites will want to ensure that there is a stable version of the system, particularly as all sites now have 10's of TBs of experiment data on disk. The SRM2.2 bits of v1.8.0 are currently being tested. Once v1.7.1 is released we can test out the upgrade path before giving the nod to sites. There will be some additional complexity when it comes to setting up the space reservation parts of SRM2.2 in your dCache. Documentation will be available when sites have to perform this additional configuration step. In fact, all of this configuration may go into YAIM.
The situation for DPM is slightly simpler. Full SRM2.2 support exists in v1.6.4 which is going through certification at the moment (1.6.3 is the current production version). Again, there will be some additional complexity in configuring the SRM2.2 spaces, but this will be documented.
Even once the SRM2.2 endpoints are available, it is likely that the SRM1 endpoint (running simultaneously on the same host) will continue to be used by the experiments until SRM2.2 becomes widely deployed and the client tools start using it as the default interface for file access.
16 April 2007
PostgreSQL housekeeping
The Edinburgh dCache recently started to show increased CPU usage (a few days after an upgrade to 1.7.0-34) as shown in the top plot. The culprit was a postgres process:
$ ps aux|grep 4419
postgres 4419 24.8 0.6 20632 12564 ? R Mar27 6091:11 postgres: pnfsserver atlas [local] PARSE
After performing a VACUUM ANALYSE on the atlas database (in fact, on the all of the databases), the CPU usage dropped back to normal, as can be seen in the bottom. I had thought auto-vacuuming was enabled by default in v8.1 of postgres, but I was mistaken. This has now been enabled by modifying the relevant entries in postgresql.conf.
stats_start_collector = on
stats_row_level = on
autovacuum = on
I also changed these parameters after the VACUUM process sent out a notice:
max_fsm_pages = 300000 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000 # min 100, ~70 bytes each
The server requires a restart after modifying the last two parameters.
Repository got lost
Recently the Edinburgh dCache has been failing. The usageInfo page was reporting
[99] Repository got lost
for all of the pools. This has been seen before, but only now do I understand why.
The dCache developers have added a process that runs in the background and periodically tries to touch a file on each of the pools. If this process fails, something is regarded as being wrong and the above message is generated. This could happen if there was a problem with the filesystem or a disk was slow to respond for some reason.
Edinburgh was being hit with this issue due to some of the disks pools being completely full, i.e., df was reporting 0 free space, while dCache still thought there was a small amount of space available. This mismatch seems to arise from the presence of the small control files on each dCache pool (these contain metadata information). Each file may take up an entire block on the disk without actually using up all of the space. I'm still trying to find out if dCache performs a stat() call on these files. It should also be noted that dCache has to read each of these control files at pool startup, so a full pool takes longer to come online than one that is empty.
There also appears to be a bug in this background process since all of the Edinburgh disk pools were reporting the error, even though some of them were empty. In the meantime, I have set the full pools to readonly and this appears to have prevented the problem reoccurring.
[99] Repository got lost
for all of the pools. This has been seen before, but only now do I understand why.
The dCache developers have added a process that runs in the background and periodically tries to touch a file on each of the pools. If this process fails, something is regarded as being wrong and the above message is generated. This could happen if there was a problem with the filesystem or a disk was slow to respond for some reason.
Edinburgh was being hit with this issue due to some of the disks pools being completely full, i.e., df was reporting 0 free space, while dCache still thought there was a small amount of space available. This mismatch seems to arise from the presence of the small control files on each dCache pool (these contain metadata information). Each file may take up an entire block on the disk without actually using up all of the space. I'm still trying to find out if dCache performs a stat() call on these files. It should also be noted that dCache has to read each of these control files at pool startup, so a full pool takes longer to come online than one that is empty.
There also appears to be a bug in this background process since all of the Edinburgh disk pools were reporting the error, even though some of them were empty. In the meantime, I have set the full pools to readonly and this appears to have prevented the problem reoccurring.
11 April 2007
dCache v1.8.0 BETA released
dCache v1.8.0 beta is now available for download:
http://www.dcache.org/downloads/1.8.0/index.shtml
As the page states, this includes the required SRM v2.2 stuff for WLCG but it is *NOT FOR PRODUCTION*. Sites should only upgrade once sufficient testing has been completed. If you would like to try it out as a test then feel free. I'm sure the dCache team would appreciate any feedback.
YAIM does no yet support the configuration of SRM 2.2 spaces and space reservation, this has to be done by hand and requires the use of a new concept of link groups. You already had pools and pool groups, units and ugroups, well now you've got links and link groups. More information is here:
https://srm.fnal.gov/twiki/bin/view/SrmProject/SrmSpaceReservation
Again, no dCache in the UK should upgrade to this version yet.
http://www.dcache.org/downloads/1.8.0/index.shtml
As the page states, this includes the required SRM v2.2 stuff for WLCG but it is *NOT FOR PRODUCTION*. Sites should only upgrade once sufficient testing has been completed. If you would like to try it out as a test then feel free. I'm sure the dCache team would appreciate any feedback.
YAIM does no yet support the configuration of SRM 2.2 spaces and space reservation, this has to be done by hand and requires the use of a new concept of link groups. You already had pools and pool groups, units and ugroups, well now you've got links and link groups. More information is here:
https://srm.fnal.gov/twiki/bin/view/SrmProject/SrmSpaceReservation
Again, no dCache in the UK should upgrade to this version yet.
03 April 2007
dCache-NGDF workshop
Colin Morey and myself attended the dCache-NGDF workshop last week in Copenhagen (hence the little mermaid). The dCache developers presented lots of useful information. The presentations can all be found here.
NGDF are committing effort to dCache development as they plan to use dCache to create a distributed Tier-1 with the "head" nodes based in Copenhagen (where the network fibres come in) and gridftp doors and pool nodes spread across mulitple sites in Scandinavia. It looks to be quite an ambitious project, but they have already started making changes to the dCache GridFTP code in order to get round the problem of a file transfer first getting routed through a door node before ending up on the destination pool. This might be OK within a site, but it becomes more of a problem when the transfer is over the WAN. Another solution to this problem involves adopting the GridFTP v2 protocol (which has not yet been adopted by Globus). It appears that both approaches will be developed.
Another interesting bit of news regards new releases of dCache. All of the SRM 2.2 stuff that is required by WLCG will come with v1.8 (sometime this month). At the same time, v1.7.1 will be released which will contain all of the same code as v1.8
other than the SRM 2.2 stuff. It would appear that they want to retain a "stable" version of dCache at all times, in addition to a version that is required by sites supporting WLCG VOs. While sensible, it doesn't inspire huge confidence in the SRM 2.2 code. In summary, all of the GridPP sites will have to upgrade to v1.8.
One last thing about 1.8 is that although it will support all of the SRM 2.2 concepts such as space reservation and storage classes (T0D1), it will not come with a suitable generic information provider (GIP) to publish this information to the BDII. GridPP may have to lend some effort in order to get this fixed.
30 March 2007
DPM srmPutDone Errors Understood
As usual the response of the DPM team to the report of srmPutDone errors was excellent.
There is a long term fix in the pipeline and a short term work around has been found: increase the maximum idle timeout in MySQL.
See the full posting over on the scotgrid blog for details.
There is a long term fix in the pipeline and a short term work around has been found: increase the maximum idle timeout in MySQL.
See the full posting over on the scotgrid blog for details.
29 March 2007
dCache 1.7.0-33 released
Most of you will have seen this already, but v1.7.0-33 of the dCache server available which should fix the data corruption problem that was announced on the user-forum list. Sites are recommended to install as soon as possible. Get in contact if there are any problems.
Storage accounting project page
I've created a project page for the storage accounting system. This will
allow Dave Kant and myself to better keep track of the open issues.
http://savannah.cern.ch/projects/storage-account/
Could all sites continue to check the published numbers and report any
inconsistencies to me.
As usual, the storage accounting page can be found here:
http://goc02.grid-support.ac.uk/storage-accounting/view.php?queryType=storage
Suggestions for improvements are welcome.
27 March 2007
DPM 1.6.4 Tagged
DPM version 1.6.4 has been tagged. The major change here is that support for secondary groups has been added. See the savannah patch for details.
Unfortunately there's another schema change in the offing. Details are in the twiki.
Unfortunately there's another schema change in the offing. Details are in the twiki.
14 March 2007
Re-enabling the new DPM GIP plugin
I spotted a minor issue with the DPM upgrade: rerunning YAIM will put the old DPM plugin, which doesn't do per-VO accounting properly, back in place.
You will have to re-enable the new plugin again by hand, after running YAIM, following the wiki instructions again.
I promise to try and get the new plugin into the next gLite release to avoid this faf...
You will have to re-enable the new plugin again by hand, after running YAIM, following the wiki instructions again.
I promise to try and get the new plugin into the next gLite release to avoid this faf...
13 March 2007
Some Extra DPM 1.6.3 Upgrade Notes
Downloading and diffing the YAIM versions 3.0.0-36 (which I ran) and 3.0.0-38 (the fixed version), reveals two things you have to do if you upgraded your DPM with v36:
Now you have a brand new shiny and advertised SRM v2.2 endpoint.
Of course, if you upgrade using v38 then you don't need to do anything.
- First, the srmv2.2 service was not chkconfiged to start again on boot, so run
# chkconfig srmv2.2 on - Secondly, the srmv2.2 service wasn't advertised in the information system so run
# /opt/glite/yaim/scripts/run_function SITE-INFO.DEF config_gip
# service globus-mds restart
Now you have a brand new shiny and advertised SRM v2.2 endpoint.
Of course, if you upgrade using v38 then you don't need to do anything.
12 March 2007
Glasgow Upgrade to DPM 1.6.3
I have upgraded Glasgow's DPM to 1.6.3 this morning. Executive summary is that everything went well, and we're now happily running the new DPM with a shiny SRM v2.2 daemon.
The key thing about this upgrade is the need to update the database schema for the new SRM v2.2 services. This update requires the DPM and DPNS daemons to be stopped. The easiest way to achieve this is to do the update through YAIM - this will stop the daemons, update the schema and then restart them. Have a look at the config_DPM_upgrade function (in YAIM 3.0.0-36) - the important upgrade is upgradeC.
As a piece of extra insurance at Glasgow I shutdown the DPM and took an extra database dump before I ran YAIM, just in case anything went wrong.
- Enter downtime in the GOC - your DPM will be down from 10-30 minutes, probably, depending on how big it is. This downtime only affects your SE though (probably safest to put in an hour, then come out early once things are working).
- Stop your DPM daemons. The right order is:
service dpm-gsiftp stop
service srmv2 stop
service srmv1 stop
service dpm stop
service rfiod stop
service dpnsdaemon stop - Dump your database (instuctions on the wiki).
- Update your RPMs. If you use APT then YAIM's install_node works. At Glasgow we just do yum update.
- Rerun YAIM:
/opt/glite/yaim/scripts/configure_node /opt/glite/yaim/etc/site-info.def SE_dpm_mysql | tee /tmp/upgrade - Now, when the script gets to the database schema upgrade beware that this takes a considerable length of time. If you look at the ganglia plot above you'll see that on Glasgow's DPM, which is a fast machine, it took 12 minutes (we have about 100 000 files in our DPM).
- As advised, we got the harmless duplicate key warning:
Configuring config_DPM_upgrade
INFO Checking for database schema version...
INFO Database version used: 2.2.0
INFO Upgrading database schema from 2.2.0 to 3.0.0 !
INFO: Stopping DPM services.
[...]
failed to query and/or update the DPNS/DPM databases : DBD::mysql::db do failed: Duplicate key name 'G_PFN_IDX' at UpdateDpmDatabase.pm line 264.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Mon Mar 12 12:01:03 2007 : Starting to update the DPNS/DPM database.
Please wait...
INFO Schema version upgrade: Now, may you see the "Duplicate key name 'G_PFN_IDX' at UpdateDpmDatabase.pm" error message, it is harmless. The indexes are already created.
INFO Starting DPM services - However, check this area of the upgrade very carefully for errors - if there is a problem with the update script running from YAIM you'll have to investgate and fix it, possibly running the dpm_support_srmv2.2 script by hand.
- Check all is well, e.g., lcg-cr a file into your DPM and then lcg-cp it back out.
- If you're really nervous (or extra careful) you could login to MySQL and check your schema version is 3.0.0 (select * from schema_version; on dpm_db and cns_db).
- Come out of downtime and pat yourself on the back.
It would be possible to do this update by hand as well. A rough outline would be:
- Downtime as above.
- Stop your DPM as above.
- Run the update script (/opt/lcg/share/DPM/dpm-support-srmv2.2/dpm_support_srmv2.2 --db-vendor MySQL --db localhost --user $DPM_DB_USER --pwd-file $tmpfile --dpns-db cns_db --dpm-db dpm_db --verbose) by hand.
- Check the db schema is updated.
- Restart DPM.
- Test file transfers.
- Come out of downtime.
I haven't tried this so at the very least double check these instructions against how YAIM might do it.
Finally, DPM disk servers and WN/UI clients can be upgraded very simple. Do it though YAIM or just simply do a yum update and on the disk servers restart rfiod and dpm-gsiftp.
28 February 2007
dCache.org clarify release procedure
The dCache team have reviewed and updated their release procedure. In summary, any new releases (minor or major) will be announced on the user-forum and announce@dcache.org mailing lists, along with a change-log for each of the rpms in the release. In addition, the rpms in the dCache stable repository
http://cvs.dcache.org/repository/apt/sl3.0.5/i386/RPMS.stable/
(and similarly for other OS and architecture combinations) will be kept in synch with the rpms that are listed on the dCache.org webpage. If this is not the case, a bug should be submitted.
The change-log should also be available in cvs. In the future, an RSS feed of the release information may also be made available.
26 February 2007
New dCache release
There is a new release of dCache:
http://www.dcache.org/downloads/1.7.0/index.shtml
dCache server is now 1.7.0-29.
dCache client is now 1.7.0-28.
There was also a 1.7.1 release, but this was only for testing purposes. Sites are recommended to use YAIM to perform the upgrade. Email the support list if there are any problems.
http://www.dcache.org/downloads/1.7.0/index.shtml
dCache server is now 1.7.0-29.
dCache client is now 1.7.0-28.
There was also a 1.7.1 release, but this was only for testing purposes. Sites are recommended to use YAIM to perform the upgrade. Email the support list if there are any problems.
23 February 2007
SE-posix SAM test (the trial run)
Here is a summary of a set of grid jobs that I ran yesterday to test posix access to site SEs. The job will eventually become the SAM test for this type of storage access. In summary it lcg-cr's a small test file to the SE, then reads it back again using a GFAL client, checks for consistency and then deletes the file using lcg-del.
srm.epcc.ed.ac.uk Passed
svr018.gla.scotgrid.ac.uk Passed
fal-pygrid-20.lancs.ac.uk Failed
lcgse1.shef.ac.uk Passed
epgse1.ph.bham.ac.uk Passed
serv02.hep.phy.cam.ac.uk Passed
t2se01.physics.ox.ac.uk Passed
lcgse01.phy.bris.ac.uk Passed
gfe02.hep.ph.ic.ac.uk Failed (both IC-HEP and LeSC)
se01.esc.qmul.ac.uk Passed
dgc-grid-34.brunel.ac.uk Passed
se1.pp.rhul.ac.uk Passed
gw-3.ccc.ucl.ac.uk Passed
For the sites that are not listed, the jobs that I submitted either still claim to be running or are scheduled (it turns out that both Durham and RAL-PPD are in downtime). I've tested RAL before and it was failing due to problems with CASTOR.
IC-HEP failed due to there being no route to host during the GFAL read. Possibly the relevant ports are not open in the firewall (22125 for dcap and 22128 for gsidcap)?
IC-LeSC failed with a Connection timed out error during the GFAL read.
Lancaster failed as the file could not even be lcg-cr'd to the SE. There was a no such file or directory error.
I'll run the tests next week once people have had a chance to look at some of these issues.
srm.epcc.ed.ac.uk Passed
svr018.gla.scotgrid.ac.uk Passed
fal-pygrid-20.lancs.ac.uk Failed
lcgse1.shef.ac.uk Passed
epgse1.ph.bham.ac.uk Passed
serv02.hep.phy.cam.ac.uk Passed
t2se01.physics.ox.ac.uk Passed
lcgse01.phy.bris.ac.uk Passed
gfe02.hep.ph.ic.ac.uk Failed (both IC-HEP and LeSC)
se01.esc.qmul.ac.uk Passed
dgc-grid-34.brunel.ac.uk Passed
se1.pp.rhul.ac.uk Passed
gw-3.ccc.ucl.ac.uk Passed
For the sites that are not listed, the jobs that I submitted either still claim to be running or are scheduled (it turns out that both Durham and RAL-PPD are in downtime). I've tested RAL before and it was failing due to problems with CASTOR.
IC-HEP failed due to there being no route to host during the GFAL read. Possibly the relevant ports are not open in the firewall (22125 for dcap and 22128 for gsidcap)?
IC-LeSC failed with a Connection timed out error during the GFAL read.
Lancaster failed as the file could not even be lcg-cr'd to the SE. There was a no such file or directory error.
I'll run the tests next week once people have had a chance to look at some of these issues.
22 February 2007
SE-posix SAM test
Everyone should be aware of the fact that a new SAM test will soon be introduced that will test the posix-like access to your site SE from our WNs. The test uses lcg-cr first copy a file to the SE, then uses GFAL (the Grid File Access Library) to perform the open(), read() and close() of the file. GFAL does all of the translation between the LFN (Logical File Name) and the TURL that will be used to access the file. It supports rfio, (gsi)dcap and gridftp so it does not matter if your site has a dCache, DPM or CASTOR. The file is removed after the test via an lcg-del.
It is essential to test this property of the SEs since it is an expected access pattern for the VOs when they are running analysis jobs at sites. In many cases it is more efficient to use the posix-like access rather than copying the entire file (or files) to the WN before processing starts.
In the first instance this new test will be NON-CRITICAL, so sites should not worry if they are not passing. We will use the information gathered from the test to solve site-specific issues. Once things become more stable it is likely that this test will move into the existing replica management super-test.
It is essential to test this property of the SEs since it is an expected access pattern for the VOs when they are running analysis jobs at sites. In many cases it is more efficient to use the posix-like access rather than copying the entire file (or files) to the WN before processing starts.
In the first instance this new test will be NON-CRITICAL, so sites should not worry if they are not passing. We will use the information gathered from the test to solve site-specific issues. Once things become more stable it is likely that this test will move into the existing replica management super-test.
Subscribe to:
Posts (Atom)