06 March 2015

Storage accounting revisited?

One of the basic features of containers - a thing which can contain something - is that you can see how full it is. If your container happens to be a grid storage element, monitoring information is available in gstat and in our status dashboard. The BDII information system publishes data, and so does the SRM (the storage element control interface), and the larger experiments at least track how much they write.

So what happens if all these measures don't agree?  We had a ticket against RAL querying why the BDII published different values from what the experiment thought they had written. It turned out to be partly because someone was attempting to count used space by space token, which leads to quite the wrong results:
Leaving aside whether these should be the correct mappings for ATLAS, the space tokens on the left do not map one-to-one to the actual storage areas (SAs) in the middle (and in general there are SAs without space tokens pointing to them). Note also that the SAs split the accounting data of the disk pools (online storage) so that the sum of the values are the same -- to avoid double counting.

The other reason for the discrepancy was the treatment of read-only servers: these are published as used space by the SRM, but not by the BDII. This is because the BDII is required to be compliant with the installed capacity agreement from a working group from 2008. The document says on p.33,
TotalOnlineSize (in GB=109) is the total online [..] size available at a given moment (it SHOULD not [sic] include broken disk servers, draining pools, etc.)
RAL uses read only disk pools essentially like draining disk pools (unlike tapes, where a read only tape is perfectly readable), so read only disk pools do not count in the total -- they do, however, count as "reserved" as specified in the same document (the GLUE schema probably intended reserved to be more like SRM's reserved, but the WLCG document interprets the field as "allocated somewhere."

Interestingly, RAL does not comply with the installed capacity document in publishing UseddOnlineSize for tape areas. The document specifies
UsedOnlineSize (in GB=109 bytes) is the space occupied by available and accessible files that are not candidates for garbage collection.
It then kind of contradicts itself in the same paragraph, saying
For CASTOR, since all files in T1D0 are candidates for garbage collection, it has been agreed that in this case UsedOnlineSize is equal to [..] TotalOnlineSize.
If we published like this, the used online size would always equal the total size, and the free size would always be zero (because the document also requires that used and free sum to total -- which doesn't always make sense either, but that is a different story.)

OK, so what might we have learnt today about storage accounting?

  1. Storage accounting is always tricky: there are all sorts of funny boundary cases, like candidates for deletion, temporary replicas, scratch space, etc.
  2. Aggregating accounting data across sites only makes sense if they all publish in the same way: they use the same attributes for the same types of values, etc. However, the supported storage elements all vary somewhat in how they treat storage internally.
  3. Before making use of the numbers, it is useful to have some sort of understanding of how they are generated (what do space tokens do? if numbers are the same for two SAs, is it because they are counting the same area twice, or because they split it 50/50? Implementers should document this and keep the documentation up to date!)
  4. There should probably be a time to review these agreements - what is the use of publishing information if it does not tell people what they want to know?
  5. Storage accounting is non-trivial... getting it right vs useful vs achievable is a bit of a balancing act.

No comments: