27 January 2016

Xrootd for all

Xrootd provides, amongst other things, a convenient method to externally access files at a site anywhere in the world using your grid credentials.

Specialist storage systems such as DPM and dcache now include a xrootd server in their deployment.
If you use a stranded POSIX file system (e.g. Lustre, GPFS, NFS) it's possible to set up a standalone xrootd server to export all or part of the file system to external clients.

The LHC experiments have have gone a step further and have set up federated storage services combining storage from several separate sites in to one namespace allowing seamless client access to storage without having to worry where the data is stored. They have provided instructions to setup a service but only for their VO e.g.



Extending this to allow access to other VOs data via xrootd but without the federated storage service is simple.

The xrootd server runs as user xrootd. In order to access files it must have the correct permissions to the files. This can be done by making the xrootd user a member of the appropriate groups across the site (e.g. via NIS).

ypcat -k group 
...
dteam dteam:x:12345:user1,user2, …, xrootd
atlas atlas:x:13345:user1,user2, …, xrootd

For simplicity I'll make a symlink to the file system I want to export on the xrootd server, e.g.

ln -sf /mnt/lustre_2/storm_3/atlas/ /atlas
ln -sf /mnt/lustre_2/storm_3/dteam/ /dteam

The xrootd server configuration file is /etc/xrootd/xrootd-clustered.cfg. Within this file we need to define the file system to export and do so read only for security,

all.export /dteam r/o
all.export /atlas r/o

We also need to add the VOs to the X509 configuration.

...
sec.protparm gsi -vomsfun:/usr/lib64/libXrdSecgsiVOMS.so -vomsfunparms:certfmt=raw | vos=atlas,dteam | grps=/atlas,/dteam
acc.authdb /etc/xrootd/auth_file
...

The /etc/xrootd/auth_file specifies the group/user access rights. The following will give read and list right to members of the atlas group to file under /atlas and dteam group members for files under /dteam

g /atlas /atlas rl
g /dteam /dteam rl


The final configuration files look like

cat xrootd-clustered.cfg
...
frm.xfr.copycmd /bin/cp /dev/null $PFN

# atlas redirection
all.manager atlas-xrd-uk.cern.ch+:1098
xrootd.redirect atlas-xrd-uk.cern.ch:1094 ? /atlas
all.sitename SITENAME

all.export /dteam r/o
all.export /atlas r/o

all.role server
all.adminpath /var/run/xrootd
all.pidpath /var/run/xrootd
xrootd.async off

# atlas Monitoring
if exec xrootd
xrd.report atl-prod05.slac.stanford.edu:9931 every 60s all -buff -poll sync
fi
# if your sites is in EU uncomment the next line
xrootd.monitor all flush 30s window 5s fstat 60 lfn ops xfr 5 dest redir files info user atlas-fax-eu-collector.cern.ch:9330

# N2N configuration. Please change for your site
oss.namelib /usr/lib64/XrdOucName2NameLFC.so

# X509 configuration, change nothing
xrootd.seclib /usr/lib64/libXrdSec.so
sec.protparm gsi -vomsfun:/usr/lib64/libXrdSecgsiVOMS.so -vomsfunparms:certfmt=raw|vos=atlas,dteam|grps=/atlas,/dteam
sec.protocol /usr/lib64 gsi -ca:1 -crl:3 -gridmap:/dev/null
acc.authdb /etc/xrootd/auth_file
acc.authrefresh 60
ofs.authorize

[root@xrootd02 xrootd]# cat auth_file
g /atlas /atlas rl
g /dteam /dteam rl


Note:
Additional file systems can be added in the same fashion for more VOs e.g. snoplus, t2k …..

It's possible to use argus server for the authentication http://londongrid.blogspot.co.uk/2014/10/xrootd-and-argus-authentication.html

The bandwidth to the file system will be limited by the performance of the xrootd server. For local file access it's still better to use native POSIX access, especially with parallel file systems like Lustre.



04 January 2016

Update on vo.dirac.ac.uk data movement and filesize distribution.

So....... I should have known that the information I posted in the blog post in November of last year would soon be out of date; but I didn't think it would be this soon! DiRAC have successfully developed their system to tar and split their data samples before transferring into the RAL Tier1. This system has dramatically increased the data transfer rates.
 What  has also changed is the number of files per tape  due to the change in average filesize per tape:
 
This has meant the number of files per tape varied from a starting value of 2-3 thousand per tape , swelling top 2-3 million before finally settling on 20-40 per tape. ( file size is ~ 250-300GB per file.)
To move large files requires good transfer rates; which we have been able to achieve; (can be seen in this log snippet):

Tue Dec 29 08:00:28 2015 INFO     bytes: 293193121792, avg KB/sec:286321, inst KB/sec:308224, elapsed:1001
Tue Dec 29 08:00:33 2015 INFO     bytes: 294824181760, avg KB/sec:286481, inst KB/sec:318566, elapsed:1006
Tue Dec 29 08:00:38 2015 INFO     bytes: 296458387456, avg KB/sec:286643, inst KB/sec:319180, elapsed:1011
Tue Dec 29 08:00:43 2015 INFO     bytes: 298053795840, avg KB/sec:286766, inst KB/sec:311603, elapsed:1016
Tue Dec 29 08:00:45 2015 INFO     bytes: 298822410240, avg KB/sec:286715, inst KB/sec:268071, elapsed:1018


Incidentally, the large filesize also helps reduce the overall rate loss due to individual overhead setup and completion per transfer. ( overhead of ~15 seconds for this file which then took 1018 seconds to transfer.This has allowed us to transfer ~ 125Tb of data over the new year period:

And a completion rate of ~90%

Although the low number of transfers does not allow the FTS optimizer to change settings so as to improve the throughput rate:


Let's hope we can continue this rate. My next step is to look at the rate at which we can create the tarballs on the source host in preparation for transfer  and whether this technique can be applied at other source sites within vo.dirac.ac.uk.