25 January 2007

OK, I forgot to add something in yesterday, so it looks like I'll have to add in a couple of days in one post.

Daty 3: Not much storage news to report here. The agenda for the day was taken up with ROC reports. Jeremy Coles presented for GridPP and talked about the data transfer tests and site storage resources. Looks like we will need to run some more FTS transfer tests to continue the stressing of the SRMs and site networks. I've got a feeling that this will become more difficult as the SRMs start filling up with real data (which is happening now). Even just trying to organise and run the tests is hard enough. John Gordon talked about accounting issues and mentioned the storage accounting system that we have been working on. Now that we have stopped using RGMA to publish the data into the database then the system should be far more stable. Have a look just now to check your site:


The collaboration drink at night was decent, although the nibbles weren't that great. Atlas seem interested in the rfio testing of DPM that we've been doing and I think there will be a lot more of this happening over the next few months. We really need to understand how the experiments are going to access the storage from the batch farm.

Day 4: Running reliable services was the topic of this morning. I realise that to many people reliable middleware is an oxymoron, but this was the subject of discussion. Single points of failure were mentioned as a real problem, but no real solution was proposed other than throwing hardware at it and running failover/redundancy systems...

A Tier-2 site in San Diego (I think) is running dCache in resilient mode on their WNs. It would be great to get some more information on this so that we can use it to influence our Manchester deployment. Speaking about that, it's great that Colin has setup the test system (4 nodes) at Manchester so that we can test out their rollout of dCache and integration with cfengine. This is something that I want to find more about.

All (and I mean all) afternoon was an SRM v2.2 BOF. The storage system implementations presented the current state of their software and the results of the CERN GD testing suite were presented. A plan for further functionality testing and stress testing was discussed; it seems very aggressive. dCache 1.8 (with all of the required SRM 2.2 methods) won't be out till April and they want to start deploying test systems to the Tier-1s at the end of March.... It seems that there are still problems with the SRM implementations. It seems that the spec is so flexible that the different implementations have their own views about what to do. Most of the trouble surrounds what to do with tape so thw whole discussion about storage classes (Tape0Disk1 etc) shouldn't affect the Tier-2s much. Over the next month or so we will find out about what the Tier-2s have to do in order to setup their storage to meet the LHC VO needs. Watch this space...

No comments: