11 November 2011

RAL T1 copes with ATLAS spike of transfers.

Following recent issues at the RAL T1 , we were worried about not just overall load on our SRM caused by ATLAS using the RAL FTS, but also the rate at which they put load on the system.
At ~10pm on the 10th November 2011 (UTC); ATLAS went from running almost empty to almost full on FTS channels involving RAL being controlled by the RAL FTS server. This can be seen in the number of active transfer plot:

This was caused by atlas suddenly putting into the ATLAS FTS many transfers which can be seen in the "Ready" queue:

This lead to a high transfer rate as shown here:
And is also seen in our own internal network monitoring:

The FTS rate is for transfers only going through the RAL FTS. ( I.e does not include puts by CERN FTS, Gets from other T1s or the chaotic background of dq2-gets, dq2-puts and lcg-cps not covered in these plots. Hopefully this means our current FTS settings can cope with start of these ATLAS data transfer spikes. We have seen from previous backlogs that these large spikes lead to a temporary backlog ( for a typical size of spike;) which clears well within a day.

