21 December 2016

Comparative Datamanagementology

GridPP was well represented at the cloud workshop at Crick. (The slides and official writeup still have not appeared as of this blog post)

The general theme was hybrids, so it is natural to wonder whether it is useful to move data between infrastructures and what is the best way to do it. In the past we have connected, for example, NGS and GridPP (through SRB, using GridFTP), but there was not a strong need for it in the user community. However, with today's "UKT0" activities and more multidisciplinary e-infrastructures, perhaps the need for moving data across infrastructures will grow stronger.

xroot may be a bit too specialised, as it is almost exclusively used by HEP? but GridFTP is widely use by users of Globus, and is the workhorse behind WAN transfers (as an aside, we hear at the AARC meeting that Globus are pondering moving away from certificates, towards a more OIDC approach - which would be new as GridFTP has always required client certificate authentication.)

The big question is whether moving data between infrastructures is useful at all - will users make use of it? It is tempting to just upload the data to some remote storage service and share links to it with collaborators. Providing "approved" infrastructure for data sharing helps users avoid the pitfalls of inappropriate data management, but they still need tools to move the data efficiently, and to manage permissions.  For example, EUDAT's B2ACCESS was specifically designed to move data into and out of EUDAT (as EUDAT does not offer compute).

So far we have focused on whether it is possible at all to move data between the infrastructures, the idea being to offer users the ability to do so. The next step is efficiency and performance, as we saw with DiRAC where we had to tar the files up in order to make the transfer of small files more efficient, and to preserve ownerships, permissions, and timestamps. 

No comments: