31 August 2015

Community knowledge: Upgrading an (SL5) DPM disk node to xrootd4.x from xrootd3.x in place.

While developers try to avoid breaking backwards compatibility between versions, sometimes it is necessary. One such situation occurred for the xrootd protocol, as the long-awaited Xrootd 4 series was released earlier this year, bringing (amongst other changes) true IPv6 support.

Unfortunately, due to the changes involved, support for xrootd4 in DPM was not immediately available (and the various tools which provide assistance and conversion for paths for VOs similarly needed porting). As a result, most sites did not move from xrootd3 at the time.

Because of the complexity of the release process (with some packages built for xrootd4 being available from different dates, and multiple repositories involved), the DPM devs published a blog entry in February concerning special instructions for managing Xrootd4 transition.

Much of the complexity of that blog entry is no longer relevant, however, as all of the dependant packages are now available - but many sites still have systems running Xrootd 3 services, including Glasgow.

So, I took a look at the process for moving from Xrootd3 to 4, on a single disk server. (Xrootd3 and 4 based DPM disk servers can co-exist with each other, and a head node with either release, so there's no need to move them all at once.) We predominantly support ATLAS at Glasgow, so the instructions here are focussed on making sure that ATLAS support works.
[NOTE: this is not sufficient to upgrade a head node to xrootd4, which would require a few additional changes, and I have not tested this yet.]

yum update emi-dpm_disk dpm-xrootd xrootd-server-atlas-n2n-plugin dmlite-plugins-adapter

(
Upgrading the emi-dpm_mysql package just pulls in the core dpm/dmlite functionality, as xrootd is an optional protocol.
dpm-xrootd pulls in updates to xrootd and the dmlite interface to it (which is what we want)
xrootd-server-atlas-n2n-plugin is needed for translation of ATLAS VO surl paths into xrootd paths.
dmlite-plugins-adapter updates the adapter library for dmlite, which is used to allow xrootd to get authorisation/authentication from dpm, for example. For some reason, none of the above packages seem to update it automatically, but without it at a new enough release, dpm-xrootd stuff will not be able to properly talk to emi-dpm_mysql stuff.
)

Specifically, you'll need to ensure that:

dmlite-plugins-adapter >= 0.7.0
xrootd-server-atlas-n2n-plugin >= 0.2

You should also check that the dpm-xrootd package pulls in vomsxrd (>= 0.3) as a dependancy - if it doesn't you need to make sure that the WLCG repo is properly enabled.

You'll also need to open
/etc/sysconfig/xrootd
and add
-k fifo
to the contents of any and all variables with names ending _OPTIONS.

Once this is all done, you can happily restart the xrootd services on the node, and all should be well. (tail -f /var/log/xrootd/disk/xrootd.log can help to spot any issues if they do appear).

24 August 2015

Castor Rebalancer a success at RAL... sort of...

We recently started using the re-balance feature of Castor Storage at RAL-LCG2

And this looked like good news to allow us to keep number of files and free space balanced across diskservers within a pool. However; a couple of days after we turned this feature on; our production team noticed a vast increase in the number of bad incomplete replicate files being produced. (Good news is that the original files still exist; so there is no loss of data. However we thought it might be good idea to effectively turn off re-balancing with a tweak to the settings on our stagerDB/ transfer management system within Castor. (I have since learned a lot more about the usage and output from our "printdbconfig" and "modifydbconfig" commands!) We have been making changes to various settings but the main settings of current interest for this an other issues have been:

CLASS          KEY                                     VALUE
--------------------------------------------------------------------
D2dCopy       MaxNbRetries                      0
Draining        MaxNbFilesScheduled         200
Draining        MaxNbSchedD2dPerDrain  200
Migration      MaxNbMounts                     7
Rebalancing  MaxNbFilesScheduled         5
Rebalancing  Sensitivity                            100

These current settings seem to have stopped the creation of new problematic files, now "just" need to work out why exactly it seems to have fixed it and see if we can re-enable re-balancing.