31 July 2007

Improvement to SAM replica management tests

The SAM people have added a new test that checks if the default SE has any free space left. This
test will not be critical by default, however its failure will cause the actual replica management tests not to execute at all. This is good news, as a full SE (in my opinion) is not really a site problem and not a reason that the site should fail a SAM test.

The full bug can be found here:

http://savannah.cern.ch/bugs/?26046

Greig

SAM failures, again

Looks like something changed inside SAM (again) yesterday, causing a large number of sites to fail the CE replica management tests with a "permission denied" error.

Further investigation shows that the failed tests were being run by someone with a DN from Cyfronet.
By default, this is mapped to the ops group in the grid map file, not opssgm like Judit Novak and Piotr Nyczyk. It is clear from the Glasgow DPM logs that this DN does not belong to ops/Role=lcgadmin. This then leads to failures in DPMs due to the fact that the dpm/domain/home/ops/generated/ directories have ACLs on them which only grant write permissions to people in ops/Role=lcgadmin.

Looks like things have been rectified now.

Why do we keep on getting hit by things like this?

27 July 2007

SRM and SRB interoperability - at last!

People have been talking for years about getting SRM and SRB "interoperable", mostly involving building complicated interfaces from SRX to SRY in various ways.

Now it turns out SRB has a GridFTP interface, developed by Argonne. So here's the idea: why don't we pretend the SRB is a Classic SE?

So we can now transfer files with gridftp (i.e. globus-url-copy) from dCache to SRB and vice versa, although the disadvantage is that you have to know the name of a pool node with a GridFTP door. Incidentally, if you try that, don't forget -nodcau or it won't work (for GridFTP 3rd party copying).

But here's the brilliant thing: it also works with FTS, since FTS still supports Classic SEs. So we have successfully transferred data between dCache, the SRM, and SRB as a, well, GridFTP server, and back again.

Cool, eh?

Next step is to set up a Classic SE-shaped information system for SRB and see if it works with lcg-utils and GFAL (because FTS does not depend on the SE having a GRIS).

This is work with Matt Hodges at Tier 1 who set up FTS, and with Roger Downing and Adil Hasan from STFC for the SRB.

--jens

20 July 2007

Slightly Modified DPM GIP Plugin

The last version of the DPM GIP plugin had a few minor bugs:
  1. The "--si" flag had been lost somewhere.
  2. DNS style VOs were not handled properly (e.g., supernemo.vo.eu-egee.org).
There's a new version which corrects these little problems: http://www.physics.gla.ac.uk/~graeme/scripts/packages/lcg-info-dynamic-dpm-2.2-2.noarch.rpm.

Now submitted as a patch in Savannah.

17 July 2007

Optimised DPM GIP Plugin

Lana Abadie noticed that my DPM GIP plugin was rather inefficient (table joins are expensive!), and sent a couple of options for improving the SQL query. I implemented them in the plugin and it speeded up by a factor of 10.

I have produced a new RPM with the optimised query, which is available here: http://www.physics.gla.ac.uk/~graeme/scripts/packages/lcg-info-dynamic-dpm-2.2-1.noarch.rpm.

I am running this already at Glasgow and I would recommend it for anyone with a large DPM.

N.B. It is compatible with DPM 1.6.3 and 1.6.5, but remember to modify the /opt/lcg/var/gip/plugin/lcg-info-dynamic-se wrapper to run /opt/lcg/libexec/lcg-info-dynamic-dpm instead of /opt/lcg/libexec/lcg-info-dynamic-dpm-beta.

SRM2.2 storage workshop

There was a storage workshop held at CERN on the 2nd and 3rd of July. The focus of discussions was on the SRM2.2 developments and testing of the endpoints. The majority of the endpoints are being published in the PPS, the intention being that the experiments will be able to use them in a ~production environment and allow some real stress tests to be run against them. The experiments see SRM2.2 as being an essential service for them, so hopefully they have sufficient manpower to run the tests...

Getting the software installed on the machines isn't a problem, but getting it configured can be tricky. The main point that I tried to highlight on a number of occasions was the necessity for sites to have really good documentation from both the developers (how the SRM2.2 spaces can be configued) and the experiments (how the SRM2.2 spaces should be configured for their needs). I will make sure that I provide instructions for everyone to ensure that the deployment goes (relatively) smoothly. It shouldn't be too much of a problem for DPM sites, dCache sites will need to start playing around with link groups ;-)

From mid-October, sites should be thinking of having these SRM2.2 spaces configured. The plan is that by January 2008, everyone will have this functionality available, and SRM2.2 will become the default interface.

DPM gridftp security

Apologies for not posting for a while, it's been a busy few weeks. First thing that should be mentioned is the gaping security hole that existed in the DPM gridftp server. Users using the uberftp (or some other suitable) client could log into the server and change permissions on anyones files, move files to different areas of the DPM namespace or even move files outside of the namespace altogether. Thanks to Kostas and Olivier at Imperial for spotting this. Unfortunately, it took a couple of weeks, 3 patch releases and a lot of testing within GridPP before we finally plugged the hole.

Initially only patched version of the 1.6.5 server was produced. I asked for the fix to be back-ported to 1.5.10 as there were a few sites still running this version, unable to upgrade to the latest release (due to the upgrade problems) due to ongoing experiment tests and wanting to be as secure as they could be. This was done, so thanks to the DPM team.

All sites should upgrade to the latest version of DPM and ensure that they are running patch -4 of the gridftp server.