22 February 2008

dCache configuration, graphviz style


I don't know about anyone else, but I'm fed up having to try and debug different site's PoolManager.conf files, especially with all this LinkGroup stuff going on. I find it too too hard to manually parse a file when it stretches to 100's of lines, making it virtually impossible to know if there are any mistakes.

In an effort to try and improve the situation, I put together a little python script last night that converts a PoolManager.conf into a .dot file. This can then be processed by GraphViz to produce a structured graph of the dCache configuration. You can see some examples of currently active dCache configurations here. The above plot shows the config at Edinburgh.

I have been creating both directional (dot) and undirectional (neato) graphs. At the moment, the most useful one is the dot plot. I'm still exploring what neato can be used for.

I think the fact that we even have to consider looking at things this way tells you two things:

1. dCache is a complex beast, with a multitude of different ways of setting things up (which has both pros and cons).
2. The basic configuration really has to be improved to save multiple man-hours that are spent across the Grid trying to debug basic problems.

At the moment, this system is only a prototype. It is intended as an aide to understanding dCache configuration and looking for potential bugs. As always, comments are welcome.

PS Thanks to Steve T for inspiring me to work on this following his graphing glue project.

16 February 2008

The CASTOR beaver



It's not often in my work that I get to talk about beavers, but I figured this was appropriate for the storage blog. I discovered yesterday that Castor is the single remaining genus of the family Castoridae, of which the beaver is a member. I think this explains the logo (both new and old).

14 February 2008

Upcoming Software

Not strictly GridPP but storage related. Discovered in a recent LWN Article some info about Coherent Remote File System (CRFS) as yet another possible NFS replacement.

05 February 2008

CMS CCRC storage requirements

Since I've already talked about ATLAS tonight, I thought it might be useful to mention CMS. The official line from the CCRC meeting that I was at today is that Tier-2 sites supporting CMS should reserve space (probably wise to give them a couple of TBs). The space token should be assigned the CMS_DEFAULT space token *description*.

DPM sites should follow a similar procedure to that for ATLAS described below. dCache sites have a slightly tougher time of it. (Matt, Mona and Chris, don't worry, I hear your cries now...)

ATLAS CCRC storage requirements

For those of you reading the ScotGrid blog, you will have seen this already. Namely, that SRMv2 writing during CCRC (which started yesterday!) will happen using the atlas/Role=production VOMS role. Therefore sites should restrict access to the ATLASMCDISK and ATLASDATADISK space tokens to this role. To do this, release and then recreate the space reservation:

# dpm-releasespace --token_desc ATLASDATADISK
# dpm-reservespace --gspace 10T --lifetime Inf --group atlas/Role=production --token_desc ATLASDATADISK

Note that at Edinburgh I set up a separate DPM pool that was only writable by atlas, atlas/Role=production and atlas/Role=lcgadmin. When reserving the space, I then used the above command but also specified the option "--poolname EdAtlasPool" to restrict the space reservation to that pool (I also only gave them 2TB ;)

Also, for those interested, here's the python code (make sure you have DPM-interfaces installed and /opt/lcg/lib/python in yout PYTHONPATH).

#! /usr/bin/env python
#
# Greig A Cowan, 2008
#

import dpm

def print_mdinfo( mds):
for md in mds:
print 's_type\t', md.s_type
print 's_token\t', md.s_token
print 's_uid\t', md.s_uid
print 's_gid\t', md.s_gid
print 'ret_pol\t', md.ret_policy
print 'ac_lat\t', md.ac_latency
print 'u_token\t', md.u_token
print 't_space\t', md.t_space
print 'g_space\t', md.g_space
print 'pool\t', md.poolname
print 'a_life\t', md.a_lifetime
print 'r_life\t', md.r_lifetime, '\n'

print '################'
print '# Space tokens #'
print '################'

tok_res, tokens = dpm.dpm_getspacetoken('')

if tok_res > -1:
md_res, metadata = dpm.dpm_getspacemd( list(tokens) )
if md_res > -1:
print_mdinfo( metadata)

01 February 2008

gLite 3.1 DPM logrotate bug

It's in savannah (#30874) but this one may catch out people who have installed the DPM-httpd. There's a missing brace that breaks logrotate on the machine.