We recently started using the re-balance feature of Castor Storage at RAL-LCG2
And this looked like good news to allow us to keep number of files and free space balanced across diskservers within a pool. However; a couple of days after we turned this feature on; our production team noticed a vast increase in the number of bad incomplete replicate files being produced. (Good news is that the original files still exist; so there is no loss of data. However we thought it might be good idea to effectively turn off re-balancing with a tweak to the settings on our stagerDB/ transfer management system within Castor. (I have since learned a lot more about the usage and output from our "printdbconfig" and "modifydbconfig" commands!) We have been making changes to various settings but the main settings of current interest for this an other issues have been:
CLASS KEY VALUE
--------------------------------------------------------------------
D2dCopy MaxNbRetries 0
Draining MaxNbFilesScheduled 200
Draining MaxNbSchedD2dPerDrain 200
Migration MaxNbMounts 7
Rebalancing MaxNbFilesScheduled 5
Rebalancing Sensitivity 100
These current settings seem to have stopped the creation of new problematic files, now "just" need to work out why exactly it seems to have fixed it and see if we can re-enable re-balancing.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment