25 February 2014

Big data picture

Not as in ((big data) picture) but (big (data picture)), if that makes sense.

I find myself in Edinburgh - it's been far too long since I last was here, I am embarrassed to say.

We are looking at data movement for EUDAT and PRACE, and by a natural extension (being at EPCC), GridPP and DiRAC. The main common data mover is GridFTP: useful because we can (more or less) all move data with GridFTP, it gets great performance, we know how to tune and monitor it, and it supports third party copying. We also need to see how to bridge in GlobusOnline, with the new delegated credentials. In fact both Contrail and NCSA developed OAuth-delegated certificates (and while the original CILogon work was OAuth1, the new stuff is OAuth2.)

One use case is data sharing (the link is to a nice little video Adam Carter from EPCC showed in the introduction). You might argue that users are not jumping up and down screaming for interdisciplinary collaborations, yet if they were possible they might happen! When data policies require data be made available, as a researcher producing data you really have no choice: your data must be shareable with other communities.

21 February 2014

TF-Storage meeting

A quick summary of the Terena TF-Storage meeting earlier this month. Having been on the mailing list for ages, it was good to attend in person - and to catch up with friends from SWITCH and Cybera.

Now there was a lot of talk about cloudy storage, particularly OpenStack's SWIFT and CINDER, as, respectively, object and block stores. At some point when I have a spare moment (haha) I will see if I can get them running in the cloud. I asked about CDMI support for SWIFT but it has not been touched in a while - it'd be a good thing to have, though (so we can use it with other stuff). Also software defined networking (SDN) got attention; it has been talked about for a while but seems to be maturing. Their product used to be called Quantum and is now called Neutron (thanks to Joe from Cybera for the link) There was talk about OpenStack and CEPH, with work by Maciej Brze┼║niak from PSNC being presented.

There's an interesting difference between people doing very clever things with their erasure codes and secrecy schemes, and the rest of us who tend to just replicate, replicate, replicate. If you look at the WLCG stuff, we tend to not do clever things - the middleware stack is already complicated enough - but just create enough replicas, and control the process fairly rigidly.

There was a discussion about identity management, of course, which mostly reiterated stuff we did for the grid about ten years ago - which led to VOMS and suchlike.

The report triggered a discussion as to whether the grid is a distributed object store.  It kind of is. 

03 February 2014

Setting up an IPv6 only DPM

As part of the general IPv6 testing work, we've just installed a small, single (virtual) node DPM at Oxford that's exclusively available over IPv6. While many client tools will prefer IPv6 to IPv4 given the choice,
some things will prefer IPv4, even if they could work over IPv6, and others might not be able to work over IPv6 at all. Having a dedicated IPv6 only testing target such as this simplifies tests - if something works at all, you know it's definitely doing it over IPv6.

The process was fairly straightforward, with a few minor catches:
  • In the YAIM config, DPM_DB_HOST is set to localhost rather than the FQDN - MySQL is IPv4 only, and if you have it try to use the machines full name, it will try to look up an IPv4 address, and fail when there isn't one.
  • The setting 'BDII_IPV6_SUPPORT=yes' is enabled to make the DPM's node BDII listen on IPv6. This is also required on the site BDII if you want it to do the same, and seems to be completely harmless when set on v4 only nodes. In any case the site BDII will need some degree of IPv6 capability so that it can connect to the DPM server.
  • YAIM requires the 'hostname -f' command to return the machines fully qualified domain name, which it will only do if the name is properly resolvable. Unfortunately, the default behaviour only attempts to look up an IPv4 address record, and so fails. It's possible to fix this cleanly by adding the option 'options inet6' as a line in /etc/resolve.conf, e.g:
    search physics.ox.ac.uk
    nameserver 2001:630:441:905::fa
    options inet6
    
  • Socket binding. For reasons that are better explained here, /etc/gai.conf needs to be set to something like:
    label ::/0 0
    label 0.0.0.0/0 1
    precedence ::/0 40
    precedence 0.0.0.0/0 10
    
    to get some services that don't explicitly bind to IPv6 addresses as well as IPv4 to get both by default.
And then YAIM it as per normal.

In addition to getting the DPM itself running, there are some sundry support services that are needed or helpful for any IPv6 only system (since it won't be able to use services that are only accessible via
IPv4). In the Oxford case, I've installed:
  • A dual stack DNS resolver to proxy DNS requests to the University's DNS servers,
  • A squid proxy to enable access to IPv4-only web services (like the EMI software repositories),
  • A dual stack site BDII. Advertising the DPM server requires the site BDII to be able to connect to it to pick up its information. That means an IPv6 capable site BDII.
The final product is named 't2dpm1-v6.physics.ox.ac.uk', and it (currently) offers support for the ops, dteam and atlas VOs, and should be accessible from any IPv6 capable grid client system.

Its been a while, but my family dynamics are changing....

Due to housing and room capacity; ATLAS decided to reduce the number of centrally controlled clones. So here is an update for where Georgina, Eve and I are now living after ATLAS's change of policy:

New Values
DataSet Name
Dave G'gina Eve
"DNA" Number
Number of "Houses" 49 70 54
Type of Rooms:DATADISK
17 37 33
Type of Rooms:LGD
32 58 24
Type of Rooms:PERF+PHYS
29 56 24
Type of Rooms:TAPE
7 12 9
Type of Rooms:USERDISK
0 1 5
Type of Rooms:CERN
8 10 10
Type of Rooms:SCRATCH
3 6 1
Type of Rooms:CALIB 0 4 7
Total number of people (including clones) 1090 1392 471
Number of unique people 876 1019 293
Numer of "people" of type:
^user 136 368 97
Numer of unique "people" of type:
^user 131 340 83
Numer of "people" of type:
^data 919 950 352
Numer of unique "people" of type:
^data 719 616 189
Numer of "people" of type:
^group 34 74 22
Numer of unique "people" of type:
^group 25 63 21
Numer of "people" of type:
^valid 1 0 0
Numer of unique "people" of type:
^valid 1 0 0
 Datasets that have 1 copy 696 763 184
 Datasets that have 2 copies 146 184 62
 Datasets that have 3 copies 34 44 33
 Datasets that have 4 copies 0 16 9
 Datasets that have 5 copies 0 7 4
 Datasets that have 6 copies 0 5 0
 Datasets that have 7 copies 0 0 0
 Datasets that have 8 copies 0 0 1
 Datasets that have 12 copies 0 0 0
 Datasets that have 13 copies 0 0 0
Number of files that have  1 copy 56029 134672 14260
Number of files that have  2 copies 8602 35191 6582
Number of files that have  3 copies 1502 1751 5924
Number of files that have  4 copies 0 1879 75
Number of files that have  5 copies 0 607 868
Number of files that have  6 copies 0 306 0
Number of files that have  7 copies 0 0 0
Number of files that have  8 copies 0 0 1
Number of files that have  12 copies 0 0 0
Number of files that have  13 copies 0 0 0
Total number of files on the grid: 77739 222694 49844
Total number of unique files: 66133 174406 27710
Data Volume (TB) that has  1 copy 9.175 30.389 4.127
Data Volume (TB) that has  2 copies 6.361 21.62 3.294
Data Volume (TB) that has  3 copies 0.223 2.452 7.812
Data Volume (TB) that has  4 copies 0 1.263 0.14
Data Volume (TB) that has  5 copies 0 1.408 0.17
Data Volume (TB) that has  6 copies 0 0.379 0
Data Volume (TB) that has  7 copies 0 0 0
Data Volume (TB) that has  8 copies 0 0 0.001
Data Volume (TB) that has  12 copies 0 0 0
Data Volume (TB) that has  13 copies 0 0 0
Total Volume of data on the grid (TB): 22.57 95.351 35.57
Total Volume of unique data (TB): 15.76 57.511 15.54



The difference in values from my last update are:


Difference
DataSet Name
D' G' E'
"DNA" Number


Number of "Houses" -10 -9 -9
Type of Rooms:DATADISK
-11 -12 -17
Type of Rooms:LGD
-5 -3 10
Type of Rooms:PERF+PHYS
-3 -2 2
Type of Rooms:TAPE
1 0 0
Type of Rooms:USERDISK
-1 -8 0
Type of Rooms:CERN
5 4 5
Type of Rooms:SCRATCH
3 -6 -13
Type of Rooms:CALIB 0 -1 0
Total number of people (including clones) -76 -202 -171
Number of unique people -18 -101 -6
Numer of "people" of type:
^user -1 -102 33
Numer of unique "people" of type:
^user -1 -89 28
Numer of "people" of type:
^data 194 -98 -186
Numer of unique "people" of type:
^data 187 -15 -23
Numer of "people" of type:
^group 3 4 -18
Numer of unique "people" of type:
^group -1 9 -11
Numer of "people" of type:
^valid 0 0 0
Numer of unique "people" of type:
^valid 0 0 0
 Datasets that have 1 copy 5 -48 6
 Datasets that have 2 copies 3 -13 -1
 Datasets that have 3 copies -18 -24 8
 Datasets that have 4 copies -71 -7 2
 Datasets that have 5 copies -1 0 -17
 Datasets that have 6 copies 0 0 -2
 Datasets that have 7 copies 0 -2 -1
 Datasets that have 8 copies 0 -1 0
 Datasets that have 12 copies 0 0 -6
 Datasets that have 13 copies 0 0 -3
Number of files that have  1 copy 2385 -6991 4751
Number of files that have  2 copies -4302 -1672 -1123
Number of files that have  3 copies -358 -1957 1404
Number of files that have  4 copies -7 -213 -35
Number of files that have  5 copies -7 35 -223
Number of files that have  6 copies 0 -181 -20
Number of files that have  7 copies 0 -142 -5
Number of files that have  8 copies 0 -73 0
Number of files that have  12 copies 0 0 -6
Number of files that have  13 copies 0 0 -5
Total number of files on the grid: -7356 -19547 26871
Total number of unique files: -2289 -11194 -16964
Data Volume (TB) that has  1 copy 0.375 2.689 2.427
Data Volume (TB) that has  2 copies -0.439 0.32 -1.406
Data Volume (TB) that has  3 copies -0.077 -3.148 0.612
Data Volume (TB) that has  4 copies -0.011 -0.237 0.02
Data Volume (TB) that has  5 copies -0.028 0.008 -0.3
Data Volume (TB) that has  6 copies 0 0.019 -0.039
Data Volume (TB) that has  7 copies 0 -0.13 -0.001
Data Volume (TB) that has  8 copies 0 -0.12 0
Data Volume (TB) that has  12 copies 0 0 -0.001
Data Volume (TB) that has  13 copies 0 0 -0.001
Total Volume of data on the grid (TB): -1.134 -8.649 -0.231
Total Volume of unique data (TB): -0.241 -0.489 1.344


The number of unique children in all three families (44TB/205k files) are completely unique to the world and are at risk to a single disk failure in a room.