29 May 2008

Monitoring grid data transfers

It has become increasingly clear during the CCRC exercises that GridView is not a good place to go if you want to get a good overview of current Grid data transfer. Simply put, it does not have all of the information required to build such a global view. It is fine if you just want to see how much data is pumping out of CERN, but beyond that I'm really not sure what it is showing me. For a start, no dCache publishes its gridftp records into GridView. Since there are ~70 dCache sites out there (5 of which are T1s!), there is a lot of data missing. There also seems to be data to/from T2s that is missing since the GridView plots I posted a few days ago show a tiny T2 contribution which just doesn't match up with what we are seeing in the UK at the moment.

To get a better idea of what is going on, you've really got to head to the experiment portals and dashboards. Phedex for CMS, DIRAC monitoring for LHCb and the ATLAS dashboard (sorry, not sure about ALICE...) all give really good information (rates, successes, failures) about all of the transfers taking place.

(This being said, I do find the ATLAS dashboard a little confusing - it's never entirely clear where I should be clicking to pull up the information I want.)

You can also go to the FTS servers and get information from their logs. In the UK we have a good set of ganglia-style monitoring plots. This provides an alternative view of the information from the experiment dashboards since it shows transfers from all VOs managed by that FTS instance. Of course, this doesn't give you any information about transfers not managed by that server, or transfers not managed by any FTS anywhere. As I've mentioned before, I put together some basic visualisation of FTS transfers which I find good to get a quick overview of the activity in the UK.

Summary: I won't be going back to GridView to find out the state of current data transfers on the Grid.

27 May 2008

CCRC'08 data transfers (part 2)



Just thought I would post a couple of plots to show the data transfers over the past couple of weeks. Things seem to have reached their peak of ~2GB/s last week and have now reached a steady state of ~1.2GB/s. These plots show transfers from all sites to all sites and it looks like the vast majority of the data appears to be going to the T1s.

I think during the ATLAS FDR exercises there will be much more data coming to the T2s from their associated T1. This should be good to help test out all that new storage that has been deployed.

DPM admin toolkit now available

I have created a small set of DPM administration tools which use the DPM python API. They have already proved useful to some sites in GridPP, so I hope they can help others as well. The tools are packaged up in an rpm (thanks to Ewan MacMahon for writing the .spec) and hosted in a yum repo to ease the installation and upgrade procedure. Information about current list of available tools, installation instructions and the bug reporting mechanism can be found here.

I'm sure there will be bugs that I have missed, so please report them to me (or via the savannah page). I would encourage people to contribute patches and their own tools if they find something missing from the current toolkit.

12 May 2008

DPM admin tools

I have started to put together a set of scripts for performing common (and possibly not-so-common) administration tasks with DPM. This uses the python interface provided by the DPM-interfaces package. It's definitely a work in progress, but you can you can checkout the latest set of scripts from here (assuming you have a CERN account):

isscvs.cern.ch:/local/reps/lhcbedinburgh/User/gcowan/storage/dpm/admin-tools

If not, then I have put them in a tarball here.

They are very much a work in progress, so things will be changing over the next few weeks (i.e., keep a lookout for changes). As always, contributions are welcome. I should probably package this stuff up as an rpm, but haven't got round to that yet. I should also put them into the sys-admin repository and write some documentation (other than the -h option). Like I say, contributions are always welcome...

CCRC May'08 data transfers



The plots above shows the slow ramp up in data transfers being run by the experiments during the May phase of CCRC. Clearly CMS are dominating proceedings at the moment. It's not clear what has happened to ATLAS after an initial spurt of activity. Hopefully they get things going soon, otherwise the Common elememt of CCRC might not be achieved. What is good is that a wide variety of sites are involved. You can even see the new Edinburgh site in there (although it is coming up as unregistered in GridView).

I'll take this opportunity to remind everyone of the base version of the gLite middleware that sites participating in the CCRC exercise are expected to be running. Have a look at the list here.