GridPP storage news

24 October 2008

Monitoring space token ACLs

I have extended my space token monitoring to pick up the ACLs (technically the GlueVOInfoAccessControlBaseRule) about each space token that is published in the information system fro each site, SE and VO. This is updated each day and you can see the table of results here. The point of this is to make it easier (i.e. no horrible ldapsearch'ing) to check the deployment status of the tokens and the ability of certain VOMs roles to write files into them. Of course, this assumes that the advertised permissions do in fact match those on the SE, which in turn relies on the GIPs being correct. YMMV.

23 October 2008

NGS and GridPP - storage future?

The GridPP/NGS pow-wow yesterday was very useful. For storage, what can we do given that GridPP runs SRM, and the NGS uses SRB, distributed filesystems, Oracle, and OGSA-DAI?

We could of course look at interoperability: previously we achieved interoperation between SRM and SRB using gLite. (In GIN-ese, "interoperation" loosely speaking applies more than a works-for-now hack than "interoperability" which is meant to be a long-term solution.)

However, not many people really need interopera* between SRM and SRB at the moment, and when they do presumably the ASGC thingy will be ready. My observation would be that no one size fits all, different things do different things differently - we should not persuade NGS to run SRM any more than they should persuade us to store our data in SRB.

What I suggest we could more usefully do is to look at the fabric: we all (i.e. sites) buy disks and have filesystems and operating systems that need patching and tuning.

Procurement: sharing experiences, maybe even common tender at sites;
Infrastructure: how to structure storage systems (topology, networks etc);
Distributed filesystems?
Optimisation and tuning: OS, networks, filesystems, drivers, kernels, etc;
Technology watch: sharing experiences with existing technology (hardware) and tracking and testing new technology (e.g. SSD)

All of the above have traditionally been out of scope of the GridPP storage group which has focused on DPM and dCache support. However, by popular demand we have recently added a Hardware Korner, a monthly(ish) hardware discussion. And we keep discussing whether we need to support distributed filesystems (whether we can is another question).

21 October 2008

GridPP DPM toolkit v2 released

I've just packaged up and released v2 of the GridPP DPM toolkit. You can find the latest information and installation instructions on the wiki. There have been a few significant changes with this release, hence the change in the major version number. In summary, the changes are as follows:

* A new naming convention. All tools have been renamed from gridpp_* to dpm-* or dpns-* to bring them into line with the existing DPM client tools (Graeme will be happy...).

* All tools are now installed in /opt/lcg/bin rather than /usr/bin (again to bring them into line with the existing client tools).

* A new tool called dpns-find. This allows a user to specify a path and a FILENAME. The tool will then recursively search the directory tree for all files that match FILENAME and print out the full path to that file to stdout. This tool doesn't attempt to reproduce the functionality of the UNIX find command; I'll see if I can extend the functionality in a future release.

* Michel Jouvin's dpm-listspaces has been updated to the latest release (which will work with the 1.6.-11 branch of DPM).

The yum repository repodata should be updated tonight. As always, let me know if there are problems with the tools or if you want to add one of your own.

16 October 2008

GridppDpmMonitor v0.0.4 released

I've made another release of GridppDpmMonitor. This adds support for viewing the space used per user (i.e. DPM virtual uid) and per group (DPM virtual gid). I've removed most of the DN information so that user privacy is retained.

You'll have to wait until tomorrow for the yum repository to be rebuilt. See previous postings on this subject to get the location of the wiki and repo.

The monitoring will be useful to see which users are exceeding their "quota", but it does not enforce a quota on the users. At the moment, there is no quotaing in the Grid SEs. However, this tool does give site admins is the ability to go and beat local users round the head with a big stick if they go over their allocation. CMS want 1TB for local users; I don't think there is a similar requirement from ATLAS, yet.

09 October 2008

New release of GridppDpmMonitor

I've made a new release of the GridppDpmMonitor. This adds a new plot which shows you the breakdown of failures according to the error message that is stored by DPM (annoyingly, this is often blank). I have also modified the existing graphs such that the client DN's are not shown (I got into trouble for this). Instead, I just show the common name (CN) of the DN, which is much more acceptable.

You can go to the usual place for instructions.

06 October 2008

Tier-2 storage at the GDB

I've been asked to talk about how we organise Tier-2 storage within the UK at this week's Grid Deployment Board. I'll also mention a couple of things where we think the middleware is lacking, i.e.,

* user quotas
* access control on SRM spaces
* administration tools for performing common tasks
* monitoring and alarms

If there is anything that comes to mind, then drop me a comment before Wednesday.

15 September 2008

Update on CASTOR infoprovider deployment

It should have been done quite a few weeks ago, when we discussed it in a CASTOR experiments meeting, but deployment was postponed first when the 6.022*(10**23) bug hit (without stagers, the dynamic information provider is not terribly useful), and then I was away for some weeks for GridPP and AHM and suchlike.

So the new infoprovider is now rescheduled for deployment at 23.09.08 at 09:00 BST. Dutyadmin will schedule a short at-risk period for CASTOR.

The new provider will publish also "unreachable" storage spaces, e.g. disk pools with no space token descriptions, according to WLCG GDB accountingforeverythingism. Moreover, it will
abu^W use the "reserved space" attribute to publish whether space has been "reserved" for a VO. Not to be confused with SRM reservedSpace.

Note that like the previous versions, this one publishes only online (disk) space. A version that publishes nearline (tape) is in the pipeline but not with a high priority.

Currently discussions are ongoing whether it is appropriate to publish space as "reserved" (in the sense above) if it has no space token descr. I publish it in the version which will go out next week, but there is a development version which doesn't.

This version of the information provider was given to CERN and (INFN or ASGC) back in August, but it is not known whether they have deployed it.

14 September 2008

Storage@GridKa school of computing

The GridKa school of computing was last week. There are a few storage related talks which you can find on the agenda.

* I gave an overview talk about storage on the Grid.

* There was a dCache tutorial session which gave a good summary of all things SRM2.2.

* Sun Microsystems had some stuff to say about Lustre and pNFS (not the dCache PNFS!).

There were also presentations about local user analysis and a session on ROOT/PROOF (which is closely related to local analysis work, although not in the Grid-storage sense). For some reason the material for those talks is not available yet. You can see the full agenda here.

21 August 2008

RAL Castor issues also affecting LHCb

The Oracle problems that heve been an issue with the CASTOR ATLAS system at RAL also appear to be affecting LHCb according to an email from Bonny this morning. Problems started at 2008-08-20 10:56.

18 August 2008

dpm-drain

Just a heads up for any DPM users - We might have discovered an issue with the draining files from a pool. If for any reason dpm thinks there's no allocatable space (not the same thing as Free space - its assuming space reservations are going to be filled) then it may move the data out of that pool and into another (such as generalPool).

Even worse it may remove those files out of a spacetoken.

I believe the problem you've run into is that dpm-drain is moving the files out of their space and putting the new copy in no particular space. The ATLAS pool would be preferentially chosen for ATLAS files but the free (non space reservation) portion has been exhausted, so it falls backs to the general pool.

I shall update when I have more information. (needless to say we stopped draining uki-scotgrid-glasgow pools this morning)

See Bug #40273

GridKa school of computing 2008

I have been asked to present a talk titles "The Importance of Data Storage" at this years GridKa school of computing. In particular, I have been asked to discuss issues like:

* computing and data storage: differences in the challenges.
* several examples of communities needing large amounts of data on the grid (if possible, communities having different access models).
* usage of different storage technologies (disk, tape, solid state,...).
* different tools for storing huge amounts of data.

If anyone has any pertinent ideas then I'd like to know about them, particularly when it comes to non-HEP users of Grid storage.

Cheers,
Greig

GridPP DPM toolkit 1.3.0 released

I meant to post about this last week, as I have released a new version of my DPM admin toolkit. This release includes a new tool called gridpp_dpm_dpns_du. This reports on the usage of directories in the DPNS namespace. This tool is something that CMS people at Bristol were asking for such that users could manage their own space.

$ gridpp_dpm_dpns_du -h
usage: gridpp_dpm_dpns_du /dpm/path/to/directory

options:
-h, --help show this help message and exit
-s, --si Use powers of 1000, not 1024.
-xEXCLUDE, --exclude=EXCLUDE
Directory to ignore.

One thing that I have noticed is that the tool sometimes claims that a directory contains (say) 914354654K. However, when you dpns-ls -l the directory it does not contain any files or sub-dirs. Deeper investigation using dpns-ls --delete shows that DPM still has some remnant of an old file replica which was previously in that directory, but has now been marked as "D" for deleted. If there are no problems, these files can be removed with rfrm. Let me know if you find any other bugs.

Note that the rpm depends on DPM-interfaces > 1.6.11-4 due to the inclusion of the dpm-listspaces tool (which needs the latest API). You can find the latest rpm here.

20 July 2008

GridPP DPM toolkit v1.2.0-1 released

I have made a new release (v1.2.0-1) of the GridPP DPM admin toolkit. This adds in a new tool written by Michel Jouvin (GRIF, France) called "dpm-listspaces". Michel's aim with this new tool is that it can be used as a replacement for both dpm-qryconf and dpm-getspacemd. It is Michel's intention to get this into the main release of DPM, but as an interim solution I have included it in the toolkit. You can find installation instructions at the above link.

It should be noted that this release has a dependency on DPM-interfaces >= 1.6.11-4 which hasn't been released yet. The reason for this is that the very latest version of the DPM python API is required for dpm-listspaces to work. If you want to have a look, DPM-interfaces v1.6.11-4 can be found in the ETICS repo here. This is compatible with v1.6.10 of DPM (I've installed it at Edinburgh).

Finally, dpm-listspaces doubles up as the new GIP for DPM. It has a --gip option which prints out the appropriate bit of LDIF conforming to GLUE 1.3. You don't have to worry about this just now, I'll talk more about it at a later date.

02 June 2008

GridppDpmMonitor

People often state that DPM is a bit of a black box. That it, they know it's working, but aren't really sure what it is doing or who is using it. To help address this problem, over the past week or so I've put together some some DPM monitoring to help visualise all of the information which is usually locked in the MySQL database. Hopefully, this will give sites an idea of what DPM is doing and make it easier to debug problems with data access.

The system works using GraphTool (I've mentioned this numerous times before) and is heavily influenced by Brian Bockleman's dCache monitoring tool. I have constructed a variety of queries which pull information directly from the DPM MySQL database about transfer requests. The monitoring is basically a set of python scripts and xml files. To ease installation and help resolve the dependencies, I've packaged (again, thanks Brian) the monitoring up into an rpm called GridppDpmMonitor and hosted it in the HEPiX sys-man yum repository. Instructions for configuration are here, along with some example plots.

I'm still working out exactly what plots we want to see and constructing the appropriate SQL queries. It's a beta (i.e. not perfect) release but I would appreciate any comments you have. The system is already running at Edinburgh, Durham and Cambridge (thanks Phil and Santanu!).

You should note that getting a transfer rate plot doesn't actually appear to be possible. If anyone can work out a way of doing it from the dpm_db tables then let me know. Also, having this monitoring linked into Nagios would be great, but that is something for the future.

As always, contributions are welcome!

29 May 2008

Monitoring grid data transfers

It has become increasingly clear during the CCRC exercises that GridView is not a good place to go if you want to get a good overview of current Grid data transfer. Simply put, it does not have all of the information required to build such a global view. It is fine if you just want to see how much data is pumping out of CERN, but beyond that I'm really not sure what it is showing me. For a start, no dCache publishes its gridftp records into GridView. Since there are ~70 dCache sites out there (5 of which are T1s!), there is a lot of data missing. There also seems to be data to/from T2s that is missing since the GridView plots I posted a few days ago show a tiny T2 contribution which just doesn't match up with what we are seeing in the UK at the moment.

To get a better idea of what is going on, you've really got to head to the experiment portals and dashboards. Phedex for CMS, DIRAC monitoring for LHCb and the ATLAS dashboard (sorry, not sure about ALICE...) all give really good information (rates, successes, failures) about all of the transfers taking place.

(This being said, I do find the ATLAS dashboard a little confusing - it's never entirely clear where I should be clicking to pull up the information I want.)

You can also go to the FTS servers and get information from their logs. In the UK we have a good set of ganglia-style monitoring plots. This provides an alternative view of the information from the experiment dashboards since it shows transfers from all VOs managed by that FTS instance. Of course, this doesn't give you any information about transfers not managed by that server, or transfers not managed by any FTS anywhere. As I've mentioned before, I put together some basic visualisation of FTS transfers which I find good to get a quick overview of the activity in the UK.

Summary: I won't be going back to GridView to find out the state of current data transfers on the Grid.

27 May 2008

CCRC'08 data transfers (part 2)

Just thought I would post a couple of plots to show the data transfers over the past couple of weeks. Things seem to have reached their peak of ~2GB/s last week and have now reached a steady state of ~1.2GB/s. These plots show transfers from all sites to all sites and it looks like the vast majority of the data appears to be going to the T1s.

I think during the ATLAS FDR exercises there will be much more data coming to the T2s from their associated T1. This should be good to help test out all that new storage that has been deployed.

DPM admin toolkit now available

I have created a small set of DPM administration tools which use the DPM python API. They have already proved useful to some sites in GridPP, so I hope they can help others as well. The tools are packaged up in an rpm (thanks to Ewan MacMahon for writing the .spec) and hosted in a yum repo to ease the installation and upgrade procedure. Information about current list of available tools, installation instructions and the bug reporting mechanism can be found here.

I'm sure there will be bugs that I have missed, so please report them to me (or via the savannah page). I would encourage people to contribute patches and their own tools if they find something missing from the current toolkit.

12 May 2008

DPM admin tools

I have started to put together a set of scripts for performing common (and possibly not-so-common) administration tasks with DPM. This uses the python interface provided by the DPM-interfaces package. It's definitely a work in progress, but you can you can checkout the latest set of scripts from here (assuming you have a CERN account):

isscvs.cern.ch:/local/reps/lhcbedinburgh/User/gcowan/storage/dpm/admin-tools

If not, then I have put them in a tarball here.

They are very much a work in progress, so things will be changing over the next few weeks (i.e., keep a lookout for changes). As always, contributions are welcome. I should probably package this stuff up as an rpm, but haven't got round to that yet. I should also put them into the sys-admin repository and write some documentation (other than the -h option). Like I say, contributions are always welcome...

CCRC May'08 data transfers

The plots above shows the slow ramp up in data transfers being run by the experiments during the May phase of CCRC. Clearly CMS are dominating proceedings at the moment. It's not clear what has happened to ATLAS after an initial spurt of activity. Hopefully they get things going soon, otherwise the Common elememt of CCRC might not be achieved. What is good is that a wide variety of sites are involved. You can even see the new Edinburgh site in there (although it is coming up as unregistered in GridView).

I'll take this opportunity to remind everyone of the base version of the gLite middleware that sites participating in the CCRC exercise are expected to be running. Have a look at the list here.

24 March 2008

Grid storage not working?

Well, going by what I heard last week at LHCb software week, I think the answer to this question is "No". The majority of the week focussed on all the cool new changes to the core LHCb software and improvements to the HLT, but there was an interesting session on Wednesday afternoon covering CCRC and more general LHCb computing operations. The point was made in 3 (yes, 3!) separate talks that LHCb continue to be plagued with storage problems which prevent their production and reconstruction jobs from successfully completing. The main issue is the instability of using local POSIX-like protocols to remotely open files on the grid SE from jobs running on the site WNs. From my understanding, this issue could broadly be separated into two categories:

1. Many of the servers being used have been configured in such a way that if a job held a file in an open state for longer than (say) 1 day, the connection was being dropped, causing the entire job to fail.

2. Sites have been running POSIX-like access serices on the same hosts that are providing the SRM. This isn't wrong, but is definitely not recommended due to the load on the system. Anyway, the real problem comes when the SRM has to be restarted for some reason (most likely an upgrade) and the site(s) appear to have just been restarting all services on the node which again resulted in any open file connections being dropped and jobs subsequently failing. I thought it was basic knowledge that everyone knew about, but apparently I was wrong.

LHCb seem to be particularly vulnerable as they have long running reconstruction jobs (>33 hours),resulting in low job efficiency when the above problems rear their ugly heads. I would be interested in comments from other experiments on these observations. Anyway, the upshot of this is that LHCb are now considering on copying data files locally prior to starting their reconstruction jobs. This won't be possible for user analysis jobs, which will be accessing events from a large number of files. Copying all of these locally isn't all that efficient, nor do you know a priori how much local space the WN has available.

xrootd was also proposed as an alternative solution. Certainly dCache, CASTOR and DPM all now provide an implementation of the xrootd protocol in addition to native dcap/rfio, so getting it deployed at sites would be relatively trivial (some places already have it available for ALICE). I don't know enough about xrootd to comment, but I'm sure if properly configured it would be able to deal with case 1 above. Case 2 is a different matter entirely... It should be noted (perhaps celebrated?) that none of the above problems have to do with SRM2.2.

Of course, LHCb only require disk at Tier-1s, so none of this applies to Tier-2 sites. Also, they reported that they saw no problems at RAL: well done guys!

In addition, the computing team have completed a large part of the stripping that the physics planning group have asked for (but this isn't really storage related).