24 September 2015

Storage system consistency checking required a new methoid to dump nameserver entries from Castor database ( who ever said blog post titles hsould be short and pithy...)

So as to partly fulfil a (re)new request to consistency check our SEs for various WLCG VOs; and for data management of smaller VOs; we realise that the RAL Tier1 needed an improved method to acquire a list per VO if all the files we have in our castor storage system.
We first tried to naively just use the castor "nsfind" command (similar to your normal find command) on our storage system.  However we soon realised this caused problems with our production system. 
So................. We decided to setup a client host and  backup offline database to query. (This also means that we now are acquiring and storing monthly dumps of the storage nameservers for longitudinal analysis (but that is the story of a future post.).

From this we tried creating a dump for one particular VO; and it took 22 days to complete. (this lead to issues as we had wished to dump the database weekly.  He improved matters by two methods.
1-Deleted ~5million old empty directories (wish the VO would do this themselves.)
2- Gave the problem to our ORACLE DBAs to look at and try to improve nsfind or come up with another method.  (The DBAs gave us a new script; which when run on a smaller part of the namespace reduced the complete time from 8.5 hours to 100 minutes.)

The additional benefit from deleting of empty directories reduced the overall dump from 22 days to 3.5 hours. Hopefully this should mean that we can provide regular (time delayed) dumps of the fileslist of our storage.

