18 May 2015

Mind The Gap

One of the features of modern data science - whether from big instruments, lots of data sources, or somewhere else - is that generally researchers need to collaborate to be able to manage the data. No single institute is able to cope with everything. Thus, many researchers use e-Infrastructures (or cyberinfrastructures to our North American friends), to connect resources and institutes together, but also to enable further collaborations with other researchers.
The next problem then arises when you have two different infrastructures which were not built to talk to each other. Here's where interoperation and standards come in.

One of the things we have talked about for a while but never got round to doing was to bridge (the) two national infrastructures for physics, GridPP and DiRAC (not to be confused with DIRAC nor with DIRAC). Now we will be moving a few petabytes from the latter to the former, initially to back up the data. Which is tricky when there are no common identities, no common data transfer protocols, no common data (replica) catalogues, accounting information, metadata catalogues, etc.

So we're going to bridge the gap without hopefully too much effort on either side, initially by making DiRAC sites look like a Tier2-(very-)lite, with essentially only a GridFTP endpoint and a VO for DiRAC. We will then start to move data across with FTS and see what happens. (Using the analogy above, we are bringing the ends closer to each other rather than increase the voltage :-))

