Showing posts with label OGF. Show all posts
Showing posts with label OGF. Show all posts

28 September 2016

Co-evolving data nodes

Bing! a mail comes in from our friends in the States saying look! here's someone in New Zealand who has set up iRODS node to GridFTP data to/from their site. It is a very detailed document yet it looks a lot like the DiRAC/GridPP data node document. They have solved many of the same problems we have solved, independently.

The basic idea is to have a node outside your institute/organisation which can be used to transfer data to/from your datastore/cluster. With a GridFTP endpoint, you could move data with FTS (as we do with DiRAC), people can use Globus (used by STFC's facilities, for example), or data can be moved to/from other e-infrastructures (such as EUDAT's B2STAGE) or EGI. Regardless of the underlying storage, there will be common topics like security, monitoring, performance, how to (or not to) firewall it, how to make it discoverable, etc. It could be the data node in a Science DMZ.

The suggestion is that we (= GridPP, DiRAC, and in fact anyone else who is willing and able) contribute to a detailed writeup which can be published as an OGF document (open access publishing for free!, and because GridFTP is an OGF protocol), either community practice or experiences - and then have a less detailed paper which could be submitted to a conference or published in a journal. 

28 March 2015

EUDAT and GridPP

EUDAT2020 (the H2020 follow-up project to EUDAT) just finished its kick-off meeting at CSC. Might be useful to jot down a few thoughts on similarities and differences and such before it is too late.

Both EUDAT and GridPP are - as far as this blog is concerned - data e- (or cyber-) infrastructures. The infrastructure is distributed across sites, sites provide storage capacity or users, there is a common authentication and authorisation scheme, there are data discovery mechanisms, both use GOCDB for service availability.

  • EUDAT will be using CDMI as its storage interface - just like EGI does - and CDMI is in many ways fairly SRM-like. We have previously done work comparing the two.
  • EUDAT will also be doing HTTP "federations" (i.e. automatic failover when a replica is missing; this is confusingly referred to as "federation" by some people).
  • Interoperation with EGI is useful/possible/thought-about (delete as applicable). EUDAT's B2STAGE will be interfacing to EGI - there is already a mailing list for discussions.
  • GridPP's (or WLCG's) metadata management is probably a bit too confusing at the moment since there is no single file catalogue 
  • B2ACCESS is the authentication and authorisation infrastructure in EUDAT; it could interoperate with GridPP via SARoNGS (ask us at OGF44 where we will also look at AARC's relation to GridPP and EUDAT). Jos tells us that KIT also have a SARoNGS type service.
  • Referencing a file is done with a persistent identifier, rather like the LFN (Logical Filename) GridPP used to have.
  • "Easy" access via WebDAV is an option for both projects. GlobusOnline is an option (sometimes) for both projects. In fact, B2STAGE is currently using GO, but will also be using FTS.
Using FTS is particularly interesting because it should then be possible to transfer files between EUDAT and GridPP. The differences between the projects are mainly that
  • GridPP is more mature - has had 14-15 years now to build its infrastructure; EUDAT is of course a much younger project (but then again, EUDAT is not exactly starting from scratch)
  • EUDAT is doing more "dynamic data" where the data might change later. Also looking at more support for the lifecycle.
  • EUDAT and GridPP have distinct user communities, to a first approximation at least.
  • The middleware is different; GridPP does of course offer compute where EUDAT will offer simpler server-side workflows. GridPP services are more integrated, where in EUDAT the B2 services are more separated (but will be unified by the discovery/lookup service and by B2ACCESS)
  • Authorisation mechanisms will be very different (but might hopefully interface to each other; there are plans for this in B2ACCESS).
There is some overlap between data sites in WLCG and those in EUDAT. This could lead to some interesting collaborations and cross-pollinations. Come to OGF44 and the EGI conference and talk to us about it.

18 July 2011

Storage accounting in OSG and OGF

Groups like UR are getting around to discussing storage records. OSG already create storage records: they have XML-formatted records for both the transfer and the file history. (With thanks to Steve Timm from FNAL.)
<StorageElementRecord xmlns:urwg="http://www.gridforum.org/2003/ur-wg">
<RecordIdentity urwg:createTime="2011-07-17T21:18:07Z" urwg:recordId="head01.aglt2.org:544527.26"/>
<UniqueID>AGLT2_SE:Pool:umfs18_3</UniqueID>
<MeasurementType>raw</MeasurementType>
<StorageType>disk</StorageType>
<TotalSpace>25993562993750</TotalSpace>
<FreeSpace>6130300894785</FreeSpace>
<UsedSpace>19863262098965</UsedSpace>
<Timestamp>2011-07-17T21:18:02Z</Timestamp>
<ProbeName>dcache-storage:head01.aglt2.org</ProbeName>
<SiteName>AGLT2_SE</SiteName>
<Grid>OSG</Grid>
</StorageElementRecord>
Over in GLUE-land, the GLUE group insist that using the GLUE schema to publish accounting data - and indeed to use GLUE data for anything other than resource selection - "cannot be done." Unfortunately the chairs didn't make it to OGF, but next steps will include work on the XML rendering of GLUE 2.0, along with the implementations.
Meanwhile, back home in GridPP-land, we use GLUE 1.3 for dynamic data. The question is still mainly about the accuracy (and freshness) of the information published: e.g. temporary copies on disk, files being "deleted" from tape, etc, how these should affect the published dynamic data. As we now have "accurate" tape accounting, the information provider should be updated soon.

07 February 2011

Get yer clouds here

At the risk of, er, promoting one of my own presentations, can I remind you to not forget to remember to join the NGS surgery this coming Wednesday, 9th, at the usual time just after the storage meeting, 10:30ish-11:30. The subject will be cloud storage in an NGI context, looking at the hows and whys and whats and whatnots, with room for discussion, too (possibly.) You can EVO in as usual if you don't have AG.

03 November 2010

Confederated conference confusion

Summaries of the data management sessions at last week's OGF, as well as more CHEP discussion, have appeared in the minutes of this week's storage meeting. Meanwhile, there is a HEPiX meeting this week, which will (probably) be discussed at the storage meeting next week - particularly if we can find someone who went to it. Share and Enjoy.

21 June 2010

Not uncontroversial

Very lively session for the Grid Storage Management community group.

We covered the new charter, agreed with the provision that we replace "EGEE" with something appropriate. We had a quick introduction to the protocol, an introduction which caused a lot more discussion than such introductions normally do.

Much of the time was spent discussing the WLCG data management jamboree. Which in a sense is outside the scope of the group, because the jamboree focused on data analysis, and SRM was designed for transfers and pre-staging and suchlike, completely different use cases.

Normally we have presentations from users, particularly those outside HEP, but since we had run out of time, those discussions had to be relegated to lunch or coffee breaks.

Slightly tricky with both experts and newbies in the room, giving introductions to SRM and also discussing technical issues. But this is how OGF works, and it is a Good Thing™ - it ensures that the discussions are open and exposes our work to others and let others provide input.

20 June 2010

Too good to be true?

A grid filesystem with: transparent replication and partial replication, striping, POSIX interface and semantics, checksumming. Open source - GPL - and, unlike some grid "open source projects" we can mention, you can actually download the source. As fast as ext4 for linux kernel build. Planning NFSv4 and/or WebDAV interfaces.

This is the promise of XtreemFS, the filesystem part (but independent part) of XtreemOS, an EU funded project. More on this later in our weekly meetings.

02 June 2009

SRM protocol status

The Grid Storage Management working group (GSM-WG) in OGF exists to standardise the SRM protocol. Why standardise? We need this to ensure the process stays open, and can benefit other communities than WLCG.
The SRM document is GFD.129, and we now have an experiences document available for public comments. You are invited to read this document and submit your comments - thanks!
You can even do so anonymously!