09 March 2011

How to build a Data Grid

Went yesterday to a collaboration meeting for OOI; their oceanographic data grid is a collaboration with IC and QMUL. What is interesting is their completely different approach to building a "data grid" from how we've done it.

Building data grids was the subject of our (GridPP storage and data management) presentation at AHM last year (how to build an infrastructure that'll cope with LHC data); and an extended version will be presented at ISGC in a few weeks (more infrastructure focus, less LHC.)

While ours was essentially communication, policies, and trust, theirs is a very computersciencey approach - a message based infrastructure which promotes data to a "first class citizen" and uses formal methods (via Scribble) to implement distributed systems with "guaranteed correct behaviour." Interestingly they have about the same data rates from their sensor networks as we have in T1. Their data stream will compress to 700 MB/s.

We have thought about using formal methods before (mostly in proposals that weren't funded :-)), so it will be interesting to compare their approach to existing data grids like ESG or WLCG. Furthermore, some of their tools, like Scribble, may well find uses in some of our other projects.

(BTW, I am calling these grids "data grids," but not as much in the SRB/iRODS sense. As someone pointed out in the session, the emphasis in (any) grid processing sensor or instrument data is on the data. Computation can in principle be redone but sensor data can never be recaptured.)

No comments: