09 June 2014

How much of a small file problem do we have...An update

So as an update to my previous post "How much of a small file problem do we have..."; I decided to have a look at a single part of the namespace within the storage element at the tier1 rather than a single disk server. (The WLCG VOs know this as a scope or family etc.)
When analysing for ATLAS ( if you remember this was the VO I was personally mostly worried about due to the large number of small files; I achieved the following numbers:

Total number of files          3670322
Total number of log files    109025
Volume of log files             4.254TB
Volume of all files              590.731TB
The log files  represent ~29.7% of the files within the scope, so perhaps the disk server I picked was enriched with log files compared to the average.
What is worrying is that this 30% of files is only reponsible for  0.7% of the disk space used ( 4.254TB out of a total 590.731TB).
The mean filesize of the log files is 3.9MB and the median filesize is 2.3MB. Also the log files size varies from 6kB to 10GB;  so some processes within the VO  do seem to be able to create large log files. If one were to remove the log files from the space; then the files mean size would increase from 161MB to 227MB ;  and the median filesize would increase from 22.87MB to 45.63MB.

