22 December 2009

T2 storage Ready for ATLAS Data Taking.. Or are we??

Been a busy couple of Months really; what with helping the Tier2 sites to prepare their storage for data taking.... Good news is the sites have done really well.
Of the three largest LHC VOs, most work has been done with ATLAS; (since they have the hungriest need for space and complexity for site administration of Tier2 space.)

All sites now have the space tokens for atlas that they require.

The ATLAS people have also been ready to see what space is available to them adjust there usage to this.

Almost all sites had either their SE/SRMs in the process of upgrade/decommissioning ready for data taking in '09 and all should be ready for '10.
Sites were very good at making changes needed by the ATLAS changing needs of space token distribution.
Sites have also been really good in working with ATLAS via atlas "hammercloud" tests to improve their storage.
Some issues still remain (Draining on DPM, limiting gridFTP connections etc, lost disk server process, data management by the VOs etc) but these challenges/opportunities will make our lives "interesting" over the coming months..

So that covers some of the known knowns.

The known unknowns ( how user analysis of real data affects on T2 storage) are also going to come about over the next few months, but I feel both the GRIDPP-Storage team, the atlas-uk support team and the site admins are ready to face what the LHC community throw at us.

Unknown unknowns; we will deal with then when they come at us....

09 December 2009

When its a pain to drain

Some experiences rejigging filessytems at ECDF today. Not sure I am recomending this approach but some of it may be useful as a dpm-drain alternative in certain circumstances.

Problem was that some data had been copied in with a limited lifetime but was in fact not OK to delete. Using dpm-drain would delete those so instead I marked the filesystem RDONLY and then did:

dpm-list-disk --server=pool1.glite.ecdf.ed.ac.uk --fs=/gridstorage010 > Stor10Files

I edited this file to replace Replica: with dpm-replicate (and delete the number at the end). (Warning: If these files are in a spacetoken you should also specify the spacetoken in this command)

Unfortunately I had to abort this part way through which left me in a bit of a pickle not knowing what files had been duplicated and could be deleted.
While you could probably figure out a way of doing this using dpm-disk-to-dpns and dpm-dpns-to-disk I instead opted for the database query

select GROUP_CONCAT(cns_db.Cns_file_replica.sfn), cns_db.Cns_file_replica.setname, count(*) from cns_db.Cns_file_replica where cns_db.Cns_file_replica LIKE '%gridstorage%' group by cns_db.Cns_file_replica.fileid INTO outfile '/tmp/Stor10Query2.txt ';

This gave me list of physical file names and the number of copies (and the spacetoken) which I could grep for a list of those with more than one copy.
grep "," /tmp/Stor10Query2.txt | cut -d ',' -f 1 > filestodelete

I could then edit this filestodelete to add dpm-delreplica to each line and sourced it to delete the files. I also made a new list of files to replicate in the same way as above. Finally I repeated the query to check all the files had 2 replicas before deleting all the originals.

Obviously this is a bit of a palava and not the ideal approach for many reasons including there is no check that the replicas are identical and the replicas made are still volatile so I'll probably just encounter the same problem again down the line. But if you really can't use dpm-drain for some reason - there is at least an alternative.