GridPP storage news: January 2007

26 January 2007

Day 5: Final day at the workshop and the agenda is made up of experiment talks covering their plans for data management come switch on of the LHC. It seems to me that their is still a lot of uncertainty about how the experiments will use SRM 2.2 and what their local data access patterns/methods will be. This uncertainty is partly caused by the fact that the SRM 2.2 implementations are not complete yet and that each storage system has its own idea about how best to provide the 2.2 functionality. Subsequently this means that the sites (particularly the Tier-1s) don't know how to optimally configure their storage systems for use by the experiments. CMS are even talking about not requiring the use of space tokens and reservation at all... Hopefully the new storage systems working group will be able to bring everyone together and cook up some site and VO-specific recipes for storage configuration. It has to be noted that these systems must be flexible enough to grow over a period of a few years as the data comes in from the LHC.

Also, Graeme found some more issues with rfio access to DPM. We will need to work with the developers in order that we can really understand what is going on. It's a good job that we started this testing just now.

OK, that's it from the workshop. Back to reality and storage phone conferences next week. I'll make sure that I send out the details before Wednesday. As usual, let me know if there are any problems.

25 January 2007

OK, I forgot to add something in yesterday, so it looks like I'll have to add in a couple of days in one post.

Daty 3: Not much storage news to report here. The agenda for the day was taken up with ROC reports. Jeremy Coles presented for GridPP and talked about the data transfer tests and site storage resources. Looks like we will need to run some more FTS transfer tests to continue the stressing of the SRMs and site networks. I've got a feeling that this will become more difficult as the SRMs start filling up with real data (which is happening now). Even just trying to organise and run the tests is hard enough. John Gordon talked about accounting issues and mentioned the storage accounting system that we have been working on. Now that we have stopped using RGMA to publish the data into the database then the system should be far more stable. Have a look just now to check your site:

http://goc02.grid-support.ac.uk/storage-accounting/view.php?queryType=storage

The collaboration drink at night was decent, although the nibbles weren't that great. Atlas seem interested in the rfio testing of DPM that we've been doing and I think there will be a lot more of this happening over the next few months. We really need to understand how the experiments are going to access the storage from the batch farm.

Day 4: Running reliable services was the topic of this morning. I realise that to many people reliable middleware is an oxymoron, but this was the subject of discussion. Single points of failure were mentioned as a real problem, but no real solution was proposed other than throwing hardware at it and running failover/redundancy systems...

A Tier-2 site in San Diego (I think) is running dCache in resilient mode on their WNs. It would be great to get some more information on this so that we can use it to influence our Manchester deployment. Speaking about that, it's great that Colin has setup the test system (4 nodes) at Manchester so that we can test out their rollout of dCache and integration with cfengine. This is something that I want to find more about.

All (and I mean all) afternoon was an SRM v2.2 BOF. The storage system implementations presented the current state of their software and the results of the CERN GD testing suite were presented. A plan for further functionality testing and stress testing was discussed; it seems very aggressive. dCache 1.8 (with all of the required SRM 2.2 methods) won't be out till April and they want to start deploying test systems to the Tier-1s at the end of March.... It seems that there are still problems with the SRM implementations. It seems that the spec is so flexible that the different implementations have their own views about what to do. Most of the trouble surrounds what to do with tape so thw whole discussion about storage classes (Tape0Disk1 etc) shouldn't affect the Tier-2s much. Over the next month or so we will find out about what the Tier-2s have to do in order to setup their storage to meet the LHC VO needs. Watch this space...

24 January 2007

OK, so now the dCache workshop is finished and we're onto the WLCG Tier-1/2 workshop at CERN. Again, I'm a bit late in commenting on the proceedings, but at least I got there in the end.

Day 1: In the morning there were LHC experiment presentations. Again, SRM v2.2 was stated as being a critical service for the success of WLCG. Alice still plan to do things their own way and require xrootd for data access. Now that dCache and (soon) DPM support xrootd as an access protocol our Tier-2 sites may have to think about enabling these access doors in order to more fully support Alice. At the moment it's not clear what we will do.

There was a data management BOF in the afternoon where details about FTS, lcg_utils etc were presented. A fair bit of time was spent discussing about how work plans and priorities are set for feature requests - not very interesting or relevant to DM.

Day 2: Storage issues played a greater part of the agenda on the second day. There was a "SRM v2.2 deployment and issues" session. Details are still required as to how the sites will configure dCache and DPM to setup the Tape0Disk1 storage classes that are required. It will be the mandate of a newly created WLCG working group to understand the issues and create documentation for sites.

In the afternoon there was a interesting DPM BOF. A number of sites (including GridPP) reported their experiences of DPM and highlighted issues with the software. The problems that every site reported were so similar that I rushed through the GridPP presentation; I didn't want to bombard the audience with information they had already heard from each of the other sites. The main issues can be summarised by:

1. Incompatibility of DPM and CASTOR rfio libraries. This is currently preventing many sites using rfio to access the data on DPMs from the WNs. This is part of the reason some VOs continue to gridftp the files to the WNs before the analysis starts.

2. Scalability of DPM. Majority of DPM sites are small (~10TB). How will DPM scale to 100'sTB spread across many more nodes? Also, how will DPM handle an entire batch farm all trying to use rfio to access the files. Work done by GridPP that has started to look at this issue was presented during the session.

3. Performance of DPM. Many sites would like to see DPM implement a more advanced system for load balancing. Currently a round robin system is used to chose the filesystem to use in a pool. Many people suggested the use of a queuing system. For those interested, dCache already has this stuff built in ;-)

4. Admin and management tools. Everyone would like to have a tool to check synchronisation of the namespace and filesystems. The developers plan to release something very soon for this. Also, everyone wanted a more fine grained and flexible pool draining mechanism, i.e., just migrate atlas files from one pool to another, not the entire contents of the pool.

5. Storage classes and space tokens. It looks like the developers are recommending that sites will set up space tokens that will map particular VO-roles to particular spaces in the DPM pools. It was suggested that this could be a way to implement quotas in DPM since the VO would reserve XTB of space. In this way, the site would just have to set up generic pools and let the space tokens handle who could write to the pool and how much they could write. In this way, VO-ACLs would probably not be required to control write access to the pools. Of course, DPM is used by smaller VOs who might not care about storage classes, SRM 2.2...

Again, the developers welcomed input from sites about performance issues. They would like to have access to a production DPM (since they don't have one) and always like you to send in log files if there is a problem.

Phew, that was a lot of information. I'm sure there is still some more to come, as well as a report on our rfio tests. I'll leave that till later, or for Graeme to do.

So, we all survived the Hamburg storms and the Sun dinner. Day 2 of the dCache workshop consisted of presentations about installation and configuration as well as talks about new features that should be present in future versions of dCache.

Owen Synge talked about the dCache build and testing system that he has set up. This makes the building of dCache more transparent and automated, allowing the developers more time to develop the code. I think it is this new procedure that has improved the reliability of dCache releases, as well as increasing their frequency. The testing system should also prove useful outside of DESY in order for sites to test basic functionality after installation. I've already got my hands on a beta version.

Martin Gasthuber presented recommendations for hardware setups. The message seemed to be that services should be split onto multiple nodes in order to increase performance. So far this doesn't appear to be necessary at our Tier-2s, but possibly they have not been getting stressed enough (particularly in terms of WN access to the SRMs) to develop problems.

For anyone wanting to use Quattor to configure your site, Stijn de Witt has developed the relevant templates. This is a lot of work and requires new templates to be made for each release. I think it is only worth looking at if you are really serious about using Quattor across the entire site.

I talked about monitoring dCache with MonAMI. This has proved very useful for highlighting the CLOSE_WAIT issue at Edinburgh. Hopefully we can get to the bottom of the problem by working with the dCache team next week. Some guys from SARA seemed interested in using it since they were experiencing similar issues and were already using Ganglia and Nagios for local monitoring. It would be great if we could integrate MonAMi with some of the new monitoring tools coming with dCache.

The dCache team then presented their responses to issues that had been raised during the site reports. Very useful information for anyone running dCache.

Chimera was presented. This is the replacement for PNFS, based on relation databases. It will continue to simulate the /pnfs namespace, so nothing should change for the users. However, rather than using NFS v2 they are moving to v3 and probably v4.1 (although the spec for that has still not been finalised). The dCache team appeared quite excited about using 4.1 since it has functionality that maps well onto dCache (separate control and data channels) and would have the advantage that the clients would come with future versions of the OS that dCache runs on (NFS being a standard).

Their was also some more information about ACLs and the new jpython interface to the dCache admin shell. This should improve our ability to script operations using the interface. The dCache team intend on providing some admin scripts and examples.

Overall the workshop was very interesting, lots of good stuff being presented. If there are any questions then drop me a email.

22 January 2007

Finally I get round to posting something about last weeks dCache workshop at DESY.

The workshop was a ~2 day affair. The first half of day 1 was devoted to Tier-1/2 site reports. Clearly there is a lot of good work getting done within the community and it was useful to see how the Tier-1s are configuring their dCache systems. A couple of things could be noted.

1. The Tier-1s are improving the performance of their systems by splitting the dCache admin node services across multiple machines. Typically the SRM, postgres and PNFS servers were being given dedicated machines. The talk on the second day by Martin Gasthuber gave further recommendations on hardware configuration.

2. Both FNAL and IN2P3 have developed their own set of tools for monitoring dCache activity. The lack of good dCache monitoring tools proved to be a common theme when talking to people at the workshop. The new SRMWatcher functionality that Timur Perelmutov (SRM developer) should help in this respect. Unfortunately they were again recommending that if you ran this process along with the SRM then a separate node should be used (due to the potential load that users could place on the database).

There was also a presentation from Sun about their new storage solution, the Sun Thumper (X4500) and their new filesystem ZFS. The presentation given by two of the ZFS developers was very interesting, if a little too well rehearsed. ZFS does look powerful though, being able to pool disk resources (in a similar way to dCache/DPM), taking instant snapshots of the filesystem, copy-on-write...

The second half of the day was taken up with presentations about the status of the dCache SRM v2.2 implementation. This was all good stuff and it looks like the final version will be out at the start of April in dCache v1.8.0. There was a short presentation about the dCache xrootd door and how they plan to implement GSI security with it. The gPlazma functionality of dCache was also presented. Present from v1.7.0 onwards this allows dCache to deal with the concept of VOMs roles which will eventually replace the gridmap-file (dcache.kpwd) although gPlazma (being Pluggable) does support this mechanism by default.

13 January 2007

The WLCG workshop is in a couple of weeks time. During the meeting there will be a DPM BOF session where sites will be given the chance to ask questions of the DPM developers. In preparation for this, a survey has been sent round to all sites asking their opinions about what problems they experience with DPM and what features they would like it to have.

If any GridPP site fills out the survey then it would be appreciated if the response could also be sent to Graeme Stewart or myself so that we can present a GridPP perspective during the BOF. Any feedback you have would be appreciated.

Cheers,
Greig

As most of you will know, it is the dCache workshop next week. During the meeting GridPP will get a chance to present its experiences with dCache. Therefore all dCache sites are encouraged to send a list of issues (both positive and negative) that they would like raised at the meeting. For example:

1. YAIM configuration
2. (un)ease of managing the system
3. monitoring
4. scalability
5. 64bit issues
6. licensing
...

This is your chance to give some feedback to the developers so I'd appreciate anything you have to offer.

Cheers,
Greig

04 January 2007

Happy New Year everyone!

I've been asked to present a summary of a Tier-2 storage system using
DPM|dCache at a grid deployment board meeting next week. Some topics that
I would like to touch on are:

1. Inadequate administration tools for pool management i.e.,

a) flexible pool quotas.
b) redistributing files among pools (this is getting better).
c) moving files from one SRM to another within a site. Catalog
re-registration.
d) managing large clusters of machines (such as at Manchester).
e) namespace - disk pool synchronisation tools.

2. How should sites be optimising storage for WAN and LAN access? Will
this really be an issue?

3. Do Tier-2s know what sort of hardware they should be buying to meet
requirements? (We have good experiences with high end dedicated disk
servers).

4. Step-by-step guidelines (both technical and procedural) that sites should
follow when data is lost.

5. There are plans for DPM such that it can be configured to allow VOs to
write data to a pool which will then be backed up to tape at the Tier-1.
Is this necessary and if so how will it impact the VO usage of Tier-2s?

6. Instabilities with dCache (CLOSE_WAIT issue).

7. SRM v2.2 interoperability testing.

8. Migration to SL4 32/64bit.

If anyone can think of additional problems that they would like brought
up then send me an email. The same goes if you want to add information to
the above list.

Cheers,
Greig

GridPP storage news

26 January 2007

25 January 2007

24 January 2007

22 January 2007

13 January 2007

04 January 2007

Current SRM versions

GridPP storage availability

Label Cloud

Links

Contributors

Blog Archive

GridPP storage availability