GridPP storage news

20 July 2008

GridPP DPM toolkit v1.2.0-1 released

I have made a new release (v1.2.0-1) of the GridPP DPM admin toolkit. This adds in a new tool written by Michel Jouvin (GRIF, France) called "dpm-listspaces". Michel's aim with this new tool is that it can be used as a replacement for both dpm-qryconf and dpm-getspacemd. It is Michel's intention to get this into the main release of DPM, but as an interim solution I have included it in the toolkit. You can find installation instructions at the above link.

It should be noted that this release has a dependency on DPM-interfaces >= 1.6.11-4 which hasn't been released yet. The reason for this is that the very latest version of the DPM python API is required for dpm-listspaces to work. If you want to have a look, DPM-interfaces v1.6.11-4 can be found in the ETICS repo here. This is compatible with v1.6.10 of DPM (I've installed it at Edinburgh).

Finally, dpm-listspaces doubles up as the new GIP for DPM. It has a --gip option which prints out the appropriate bit of LDIF conforming to GLUE 1.3. You don't have to worry about this just now, I'll talk more about it at a later date.

02 June 2008

GridppDpmMonitor

People often state that DPM is a bit of a black box. That it, they know it's working, but aren't really sure what it is doing or who is using it. To help address this problem, over the past week or so I've put together some some DPM monitoring to help visualise all of the information which is usually locked in the MySQL database. Hopefully, this will give sites an idea of what DPM is doing and make it easier to debug problems with data access.

The system works using GraphTool (I've mentioned this numerous times before) and is heavily influenced by Brian Bockleman's dCache monitoring tool. I have constructed a variety of queries which pull information directly from the DPM MySQL database about transfer requests. The monitoring is basically a set of python scripts and xml files. To ease installation and help resolve the dependencies, I've packaged (again, thanks Brian) the monitoring up into an rpm called GridppDpmMonitor and hosted it in the HEPiX sys-man yum repository. Instructions for configuration are here, along with some example plots.

I'm still working out exactly what plots we want to see and constructing the appropriate SQL queries. It's a beta (i.e. not perfect) release but I would appreciate any comments you have. The system is already running at Edinburgh, Durham and Cambridge (thanks Phil and Santanu!).

You should note that getting a transfer rate plot doesn't actually appear to be possible. If anyone can work out a way of doing it from the dpm_db tables then let me know. Also, having this monitoring linked into Nagios would be great, but that is something for the future.

As always, contributions are welcome!

29 May 2008

Monitoring grid data transfers

It has become increasingly clear during the CCRC exercises that GridView is not a good place to go if you want to get a good overview of current Grid data transfer. Simply put, it does not have all of the information required to build such a global view. It is fine if you just want to see how much data is pumping out of CERN, but beyond that I'm really not sure what it is showing me. For a start, no dCache publishes its gridftp records into GridView. Since there are ~70 dCache sites out there (5 of which are T1s!), there is a lot of data missing. There also seems to be data to/from T2s that is missing since the GridView plots I posted a few days ago show a tiny T2 contribution which just doesn't match up with what we are seeing in the UK at the moment.

To get a better idea of what is going on, you've really got to head to the experiment portals and dashboards. Phedex for CMS, DIRAC monitoring for LHCb and the ATLAS dashboard (sorry, not sure about ALICE...) all give really good information (rates, successes, failures) about all of the transfers taking place.

(This being said, I do find the ATLAS dashboard a little confusing - it's never entirely clear where I should be clicking to pull up the information I want.)

You can also go to the FTS servers and get information from their logs. In the UK we have a good set of ganglia-style monitoring plots. This provides an alternative view of the information from the experiment dashboards since it shows transfers from all VOs managed by that FTS instance. Of course, this doesn't give you any information about transfers not managed by that server, or transfers not managed by any FTS anywhere. As I've mentioned before, I put together some basic visualisation of FTS transfers which I find good to get a quick overview of the activity in the UK.

Summary: I won't be going back to GridView to find out the state of current data transfers on the Grid.

27 May 2008

CCRC'08 data transfers (part 2)

Just thought I would post a couple of plots to show the data transfers over the past couple of weeks. Things seem to have reached their peak of ~2GB/s last week and have now reached a steady state of ~1.2GB/s. These plots show transfers from all sites to all sites and it looks like the vast majority of the data appears to be going to the T1s.

I think during the ATLAS FDR exercises there will be much more data coming to the T2s from their associated T1. This should be good to help test out all that new storage that has been deployed.

DPM admin toolkit now available

I have created a small set of DPM administration tools which use the DPM python API. They have already proved useful to some sites in GridPP, so I hope they can help others as well. The tools are packaged up in an rpm (thanks to Ewan MacMahon for writing the .spec) and hosted in a yum repo to ease the installation and upgrade procedure. Information about current list of available tools, installation instructions and the bug reporting mechanism can be found here.

I'm sure there will be bugs that I have missed, so please report them to me (or via the savannah page). I would encourage people to contribute patches and their own tools if they find something missing from the current toolkit.

12 May 2008

DPM admin tools

I have started to put together a set of scripts for performing common (and possibly not-so-common) administration tasks with DPM. This uses the python interface provided by the DPM-interfaces package. It's definitely a work in progress, but you can you can checkout the latest set of scripts from here (assuming you have a CERN account):

isscvs.cern.ch:/local/reps/lhcbedinburgh/User/gcowan/storage/dpm/admin-tools

If not, then I have put them in a tarball here.

They are very much a work in progress, so things will be changing over the next few weeks (i.e., keep a lookout for changes). As always, contributions are welcome. I should probably package this stuff up as an rpm, but haven't got round to that yet. I should also put them into the sys-admin repository and write some documentation (other than the -h option). Like I say, contributions are always welcome...

CCRC May'08 data transfers

The plots above shows the slow ramp up in data transfers being run by the experiments during the May phase of CCRC. Clearly CMS are dominating proceedings at the moment. It's not clear what has happened to ATLAS after an initial spurt of activity. Hopefully they get things going soon, otherwise the Common elememt of CCRC might not be achieved. What is good is that a wide variety of sites are involved. You can even see the new Edinburgh site in there (although it is coming up as unregistered in GridView).

I'll take this opportunity to remind everyone of the base version of the gLite middleware that sites participating in the CCRC exercise are expected to be running. Have a look at the list here.

24 March 2008

Grid storage not working?

Well, going by what I heard last week at LHCb software week, I think the answer to this question is "No". The majority of the week focussed on all the cool new changes to the core LHCb software and improvements to the HLT, but there was an interesting session on Wednesday afternoon covering CCRC and more general LHCb computing operations. The point was made in 3 (yes, 3!) separate talks that LHCb continue to be plagued with storage problems which prevent their production and reconstruction jobs from successfully completing. The main issue is the instability of using local POSIX-like protocols to remotely open files on the grid SE from jobs running on the site WNs. From my understanding, this issue could broadly be separated into two categories:

1. Many of the servers being used have been configured in such a way that if a job held a file in an open state for longer than (say) 1 day, the connection was being dropped, causing the entire job to fail.

2. Sites have been running POSIX-like access serices on the same hosts that are providing the SRM. This isn't wrong, but is definitely not recommended due to the load on the system. Anyway, the real problem comes when the SRM has to be restarted for some reason (most likely an upgrade) and the site(s) appear to have just been restarting all services on the node which again resulted in any open file connections being dropped and jobs subsequently failing. I thought it was basic knowledge that everyone knew about, but apparently I was wrong.

LHCb seem to be particularly vulnerable as they have long running reconstruction jobs (>33 hours),resulting in low job efficiency when the above problems rear their ugly heads. I would be interested in comments from other experiments on these observations. Anyway, the upshot of this is that LHCb are now considering on copying data files locally prior to starting their reconstruction jobs. This won't be possible for user analysis jobs, which will be accessing events from a large number of files. Copying all of these locally isn't all that efficient, nor do you know a priori how much local space the WN has available.

xrootd was also proposed as an alternative solution. Certainly dCache, CASTOR and DPM all now provide an implementation of the xrootd protocol in addition to native dcap/rfio, so getting it deployed at sites would be relatively trivial (some places already have it available for ALICE). I don't know enough about xrootd to comment, but I'm sure if properly configured it would be able to deal with case 1 above. Case 2 is a different matter entirely... It should be noted (perhaps celebrated?) that none of the above problems have to do with SRM2.2.

Of course, LHCb only require disk at Tier-1s, so none of this applies to Tier-2 sites. Also, they reported that they saw no problems at RAL: well done guys!

In addition, the computing team have completed a large part of the stripping that the physics planning group have asked for (but this isn't really storage related).

04 March 2008

Visualising FTS transfers

As always, monitoring is a hot topic. Sites, experiments and operations people all want to know what Grid services are doing and how this is impacting on their work. In particular, it is clear from todays GDB that monitoring of the FTS is important. The above graph shows the output of a script which I have put together over the past day which queries the RAL FTS service for information about its channels, their transfer rates and information about the number of jobs that are scheduled to be transferred. In fact, I don't query the FTS directly, but use some CGI that Matt Hodges at RAL set up (thanks Matt!). It's a prototype at the moment, but I think it could be one useful way of looking at the data and getting an overview of the state of transfers.

You can see the latest plots for the WLCG VOs here. They are updated hourly. I still need to play about with the colours to try and improve the visual effect. It would be great if the line thickness could be varied with transfer rate, but I don't think GraphViz/pydot can do that.

22 February 2008

dCache configuration, graphviz style

I don't know about anyone else, but I'm fed up having to try and debug different site's PoolManager.conf files, especially with all this LinkGroup stuff going on. I find it too too hard to manually parse a file when it stretches to 100's of lines, making it virtually impossible to know if there are any mistakes.

In an effort to try and improve the situation, I put together a little python script last night that converts a PoolManager.conf into a .dot file. This can then be processed by GraphViz to produce a structured graph of the dCache configuration. You can see some examples of currently active dCache configurations here. The above plot shows the config at Edinburgh.

I have been creating both directional (dot) and undirectional (neato) graphs. At the moment, the most useful one is the dot plot. I'm still exploring what neato can be used for.

I think the fact that we even have to consider looking at things this way tells you two things:

1. dCache is a complex beast, with a multitude of different ways of setting things up (which has both pros and cons).
2. The basic configuration really has to be improved to save multiple man-hours that are spent across the Grid trying to debug basic problems.

At the moment, this system is only a prototype. It is intended as an aide to understanding dCache configuration and looking for potential bugs. As always, comments are welcome.

PS Thanks to Steve T for inspiring me to work on this following his graphing glue project.

16 February 2008

The CASTOR beaver

It's not often in my work that I get to talk about beavers, but I figured this was appropriate for the storage blog. I discovered yesterday that Castor is the single remaining genus of the family Castoridae, of which the beaver is a member. I think this explains the logo (both new and old).

14 February 2008

Upcoming Software

Not strictly GridPP but storage related. Discovered in a recent LWN Article some info about Coherent Remote File System (CRFS) as yet another possible NFS replacement.

05 February 2008

CMS CCRC storage requirements

Since I've already talked about ATLAS tonight, I thought it might be useful to mention CMS. The official line from the CCRC meeting that I was at today is that Tier-2 sites supporting CMS should reserve space (probably wise to give them a couple of TBs). The space token should be assigned the CMS_DEFAULT space token *description*.

DPM sites should follow a similar procedure to that for ATLAS described below. dCache sites have a slightly tougher time of it. (Matt, Mona and Chris, don't worry, I hear your cries now...)

ATLAS CCRC storage requirements

For those of you reading the ScotGrid blog, you will have seen this already. Namely, that SRMv2 writing during CCRC (which started yesterday!) will happen using the atlas/Role=production VOMS role. Therefore sites should restrict access to the ATLASMCDISK and ATLASDATADISK space tokens to this role. To do this, release and then recreate the space reservation:

# dpm-releasespace --token_desc ATLASDATADISK
# dpm-reservespace --gspace 10T --lifetime Inf --group atlas/Role=production --token_desc ATLASDATADISK

Note that at Edinburgh I set up a separate DPM pool that was only writable by atlas, atlas/Role=production and atlas/Role=lcgadmin. When reserving the space, I then used the above command but also specified the option "--poolname EdAtlasPool" to restrict the space reservation to that pool (I also only gave them 2TB ;)

Also, for those interested, here's the python code (make sure you have DPM-interfaces installed and /opt/lcg/lib/python in yout PYTHONPATH).


#! /usr/bin/env python
#
# Greig A Cowan, 2008
#

import dpm

def print_mdinfo( mds):
    for md in mds:
        print 's_type\t', md.s_type
        print 's_token\t', md.s_token
        print 's_uid\t', md.s_uid
        print 's_gid\t', md.s_gid
        print 'ret_pol\t', md.ret_policy
        print 'ac_lat\t', md.ac_latency
        print 'u_token\t', md.u_token
        print 't_space\t', md.t_space
        print 'g_space\t', md.g_space
        print 'pool\t', md.poolname
        print 'a_life\t', md.a_lifetime
        print 'r_life\t', md.r_lifetime, '\n'

print '################'
print '# Space tokens #'
print '################'

tok_res, tokens = dpm.dpm_getspacetoken('')

if tok_res > -1:
    md_res, metadata = dpm.dpm_getspacemd( list(tokens) )
    if md_res > -1:
        print_mdinfo( metadata)

01 February 2008

gLite 3.1 DPM logrotate bug

It's in savannah (#30874) but this one may catch out people who have installed the DPM-httpd. There's a missing brace that breaks logrotate on the machine.

29 January 2008

CASTOR dynamic information publishing

Folks, we now have dynamic data published for CASTOR. Yes, it's something we were supposed to do since the EGEE conference in Pisa(!), but only now has it percolated up through all the other tasks to become a high priority - otherwise the focus has been on CASTOR itself.
For details (and current limitations) of the implementation, please refer to: https://www.gridpp.ac.uk/wiki/RAL_Tier1_CASTOR_Accounting

23 January 2008

CCRC confusion

So, which Tier-2s are involved in the February CCRC exercise? Does anyone know? What about the date that they will get involved? Do they need to have SRM2.2, or not? Some sources suggest they do, others suggest they don't. If anyone knows the answers to these questions, please email me. I think I'll start attending the daily meetings to find out what is going on.

Also, it turns out that although the 1.8.0-12 release of dCache was made sometime last week, it turns out that this is also the CCRC branch of dCache. Do you spot the difference? Good, then we can continue. This explains the weird naming convention that the dCache developers are using for all CCRC related releases, namely 1.8.0-12p1, 1.8.0-12p2 for patch versions 1 and 2 of the 1.8.0-12 branch. Hope that clears things up.

Banned by SAM

Looks like the SAM people weren't very happy with the monitoring that I was running to summarise the status of storage resources on the Grid. In fact, they were so unhappy that they decided to block the IP address that I was using and switch off access to the /sqldb/ query path on their server! As such, the monitoring hasn't been working for the past few days.

I'll admit that I was putting a fair bit of load on their database since I was using a multithreaded application to request information about 6 different tests for ~200 SEs over the past 30 days, but I think this is a bit harsh. Another part of the problem is that the script seemed to trip up when processing information on a couple of sites (and it was always the same sites), leading to it continually requesting the same information. I'd love to debug why this was, but this is somewhat difficult when I can't access the DB.

I'm currently jumping through some hoops to try and get access to the new JSP endpoint. Hopefully this will be granted soon and I can update the application to deal with the new XML schema.

16 January 2008

Monitoring SRM2.2 deployment

Using a combination of the WLCG information system, srmPing from the StoRM SRM client, a multithreaded python script, GraphTool and Apache, I've set up some daily monitoring of all of the SRM v2.2 endpoints that are available on the Grid. This will help track deployment of the new software, allowing us to see which versions are running in production.

http://wn3.epcc.ed.ac.uk/srm/xml/

Using the cool GraphTool features, you can drill down into the data by typing dCache, CASTOR or DPM in the SRM_flavour box. You can also look at a particular countries endpoints by putting something like .uk in the endpoint box (for all UK sites). Hopefully poor old wn3.epcc can handle all this monitoring that it's doing!

One thing to note is that there is quite a variety of dCache 1.8.0-X out there. What are the developers playing at?

A slightly annoying feature is that srmPing requires a valid proxy, so I'll need to come up with a good way of creating a long-lived one.

Global dCache monitoring

It's been a while since my last post - been busy with some things that I will post about shortly. As you know, I've been running the GridPP SAM storage monitoring for the past few weeks. It looks like the dCache team got wind of this and have asked if I could set up something similar to summarise the SAM test results for all dCache sites in the WLCG information system. This is now done and the results can be seen above and at this new page in the dCache wiki:

http://trac.dcache.org/trac.cgi/wiki/MonitoringDcache

I don't seem to be able to get SAM results for the US Tier-2 sites, so I'll need to investigate this. The above link contains other useful information that I'll talk about shortly.