25 April 2007

Storage talks at HEPiX

There are lots of storage related presentations at HEPiX today:


Of particular relevance to WLCG are the presentations on dCache, DPM and SRM. There are also talks about data corruption and distributed filesystems (GPFS, Lustre,...).

Should make for some interesting reading.

dCache, DPM and SRM2.2

As most of you know, the LCG experiments are requiring that all storage be accessible via the SRM2.2 interface. The current version of dCache, v1.7.0, only provides the SRM1 interface (and an incomplete SRM2.2). Full SRM2.2 support will only be available in the v1.8.0 branch of dCache. Once v1.8.0 has been fully tested and moves into production, all GridPP dCache sites will have to upgrade.

As I understand the situation, no upgrade path between v1.7.0 and v1.8.0 is planned. Sites will first have to upgrade to v1.7.1 and then move onto v1.8.0. The plan is such that v1.7.1 will contain the same code as v1.8.0, minus the SRM2.2 stuff.

Obviously all dCache sites will want to ensure that there is a stable version of the system, particularly as all sites now have 10's of TBs of experiment data on disk. The SRM2.2 bits of v1.8.0 are currently being tested. Once v1.7.1 is released we can test out the upgrade path before giving the nod to sites. There will be some additional complexity when it comes to setting up the space reservation parts of SRM2.2 in your dCache. Documentation will be available when sites have to perform this additional configuration step. In fact, all of this configuration may go into YAIM.

The situation for DPM is slightly simpler. Full SRM2.2 support exists in v1.6.4 which is going through certification at the moment (1.6.3 is the current production version). Again, there will be some additional complexity in configuring the SRM2.2 spaces, but this will be documented.

Even once the SRM2.2 endpoints are available, it is likely that the SRM1 endpoint (running simultaneously on the same host) will continue to be used by the experiments until SRM2.2 becomes widely deployed and the client tools start using it as the default interface for file access.

16 April 2007

PostgreSQL housekeeping

The Edinburgh dCache recently started to show increased CPU usage (a few days after an upgrade to 1.7.0-34) as shown in the top plot. The culprit was a postgres process:

$ ps aux|grep 4419
postgres 4419 24.8 0.6 20632 12564 ? R Mar27 6091:11 postgres: pnfsserver atlas [local] PARSE

After performing a VACUUM ANALYSE on the atlas database (in fact, on the all of the databases), the CPU usage dropped back to normal, as can be seen in the bottom. I had thought auto-vacuuming was enabled by default in v8.1 of postgres, but I was mistaken. This has now been enabled by modifying the relevant entries in postgresql.conf.

stats_start_collector = on
stats_row_level = on
autovacuum = on

I also changed these parameters after the VACUUM process sent out a notice:

max_fsm_pages = 300000 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000 # min 100, ~70 bytes each

The server requires a restart after modifying the last two parameters.

Repository got lost

Recently the Edinburgh dCache has been failing. The usageInfo page was reporting

[99] Repository got lost

for all of the pools. This has been seen before, but only now do I understand why.

The dCache developers have added a process that runs in the background and periodically tries to touch a file on each of the pools. If this process fails, something is regarded as being wrong and the above message is generated. This could happen if there was a problem with the filesystem or a disk was slow to respond for some reason.

Edinburgh was being hit with this issue due to some of the disks pools being completely full, i.e., df was reporting 0 free space, while dCache still thought there was a small amount of space available. This mismatch seems to arise from the presence of the small control files on each dCache pool (these contain metadata information). Each file may take up an entire block on the disk without actually using up all of the space. I'm still trying to find out if dCache performs a stat() call on these files. It should also be noted that dCache has to read each of these control files at pool startup, so a full pool takes longer to come online than one that is empty.

There also appears to be a bug in this background process since all of the Edinburgh disk pools were reporting the error, even though some of them were empty. In the meantime, I have set the full pools to readonly and this appears to have prevented the problem reoccurring.

11 April 2007

dCache v1.8.0 BETA released

dCache v1.8.0 beta is now available for download:


As the page states, this includes the required SRM v2.2 stuff for WLCG but it is *NOT FOR PRODUCTION*. Sites should only upgrade once sufficient testing has been completed. If you would like to try it out as a test then feel free. I'm sure the dCache team would appreciate any feedback.

YAIM does no yet support the configuration of SRM 2.2 spaces and space reservation, this has to be done by hand and requires the use of a new concept of link groups. You already had pools and pool groups, units and ugroups, well now you've got links and link groups. More information is here:


Again, no dCache in the UK should upgrade to this version yet.

03 April 2007

dCache-NGDF workshop

Colin Morey and myself attended the dCache-NGDF workshop last week in Copenhagen (hence the little mermaid). The dCache developers presented lots of useful information. The presentations can all be found here.

NGDF are committing effort to dCache development as they plan to use dCache to create a distributed Tier-1 with the "head" nodes based in Copenhagen (where the network fibres come in) and gridftp doors and pool nodes spread across mulitple sites in Scandinavia. It looks to be quite an ambitious project, but they have already started making changes to the dCache GridFTP code in order to get round the problem of a file transfer first getting routed through a door node before ending up on the destination pool. This might be OK within a site, but it becomes more of a problem when the transfer is over the WAN. Another solution to this problem involves adopting the GridFTP v2 protocol (which has not yet been adopted by Globus). It appears that both approaches will be developed.

Another interesting bit of news regards new releases of dCache. All of the SRM 2.2 stuff that is required by WLCG will come with v1.8 (sometime this month). At the same time, v1.7.1 will be released which will contain all of the same code as v1.8
other than the SRM 2.2 stuff. It would appear that they want to retain a "stable" version of dCache at all times, in addition to a version that is required by sites supporting WLCG VOs. While sensible, it doesn't inspire huge confidence in the SRM 2.2 code. In summary, all of the GridPP sites will have to upgrade to v1.8.

One last thing about 1.8 is that although it will support all of the SRM 2.2 concepts such as space reservation and storage classes (T0D1), it will not come with a suitable generic information provider (GIP) to publish this information to the BDII. GridPP may have to lend some effort in order to get this fixed.