01 March 2017

Token Effort

A common issue with new VOs is determining their storage requirements and ensuring, where possible, we can accommodate them. Many VOs start with little idea about their absolute storage requirements or even that storage is something they need to consider or plan for.

Like most sites, Liverpool has a small shared pool that any supported VO can write to. This demonstrates the tragedy of the commons as this pool is regularly filled up by a new or newly active VO, followed by GGUS tickets when the space is used up. This recently happened with Snoplus who had just started production use of the grid and were staging out data files to the local SE.

While we could keep putting more space into the shared area it doesn't solve the problem, only puts it off for a while. Eventually the space is filled up again, impacting all supported VOs without dedicated pools.

Within VOs there are often multiple roles, eg standard users, production users, particular physics groups etc. Pools dedicated to a specific VO can still have issues with particular users consuming storage at the expense of others eg a normal user filling up a pool and blocking important production work.

In DPM (at least) pools are usually an aggregate of filesystems from multiple storage servers. These filesystems are often large in size, typically 10s of TBs and this becomes the minimum granularity for allocating space in pools. Furthermore, to obtain good performance files should be spread across many storage servers so pools should contain at least one filesystem from a number of servers. This increases the minimum amount of space that we would prefer to allocate to an individual VO.

For large VOs, like ATLAS, whose pools are of the order of 100s of TBs or more this is not a problem. For a VO that only needs a few TBs this would be extremely wasteful of space and performance to allocate just one filesystem.

Luckily we have a tool which allows us to allocate and reserve arbitrary amounts of space for VOs or even specific roles within a VO. This tool is the Space Token.

  • A Space Token reserves space such that we can guarantee it being available for the VO and role specified.
  • Space Tokens for different VOs can co-exist on a single pool, allowing an efficient large pool to be carved up into small areas.
  • Space Token reservations and permissions can be modified arbitrarily at any time.

This is good for the admin, allowing easy allocation of storage from a global pool, and is good for the VO, having a guaranteed amount of storage for their exclusive use. There are some downsides, however.

  • In DPM, at least, re-assigning existing files to a Space Token isn't supported by the standard utilities and requires manual editing of the database. Not a task to be undertaken lightly.
  • Some SEs allocate the files to a Space Token automatically depending on the path, others require the user to specify it in the file copy commands. Not all utilities support specifying a Space Token (eg xrdcp).
  • Space Tokens rely on SRM support, which is a Grid specific protocol which may be deprecated in the future.
  • Querying the state or existence of the Space Token isn't an obvious task. Even when a user knows about a Space Token the tools don't fully support all options eg if a Space Token supports a particular role.
  • It's easier for the VO to manage if all sites have the same token names and permissions, which requires some coordination between the sites and VO.
Some SEs support technologies that make it easier to allocate space eg default Space Tokens or quotas on a path but this support varies at present. The only technology currently supported by all SEs is the Space Token.

At Liverpool our policy is now to not allocate any more storage to a VO without a Space Token. Given the lack of tools for migrating files we also recommend VOs that require storage at sites specify their requirements (capacity, permissions and Space Token names) before they start writing any data.

Alongside the typical large LHC VOs like ATLAS and LHCb we have done this successfully for T2K, Biomed and Snoplus, plus local Tier3 groups so far.


No comments: