24 August 2015

Castor Rebalancer a success at RAL... sort of...

We recently started using the re-balance feature of Castor Storage at RAL-LCG2

And this looked like good news to allow us to keep number of files and free space balanced across diskservers within a pool. However; a couple of days after we turned this feature on; our production team noticed a vast increase in the number of bad incomplete replicate files being produced. (Good news is that the original files still exist; so there is no loss of data. However we thought it might be good idea to effectively turn off re-balancing with a tweak to the settings on our stagerDB/ transfer management system within Castor. (I have since learned a lot more about the usage and output from our "printdbconfig" and "modifydbconfig" commands!) We have been making changes to various settings but the main settings of current interest for this an other issues have been:

CLASS          KEY                                     VALUE
--------------------------------------------------------------------
D2dCopy       MaxNbRetries                      0
Draining        MaxNbFilesScheduled         200
Draining        MaxNbSchedD2dPerDrain  200
Migration      MaxNbMounts                     7
Rebalancing  MaxNbFilesScheduled         5
Rebalancing  Sensitivity                            100

These current settings seem to have stopped the creation of new problematic files, now "just" need to work out why exactly it seems to have fixed it and see if we can re-enable re-balancing.






No comments: