08 August 2014

ARGUS user suspension with DPM

Many grid services that need to authenticate their users do so with LCAS/LCMAPS plugins, making integration with a site central authentication server such as ARGUS relatively straightforward. With the ARGUS client LCAS/LCMAPS plugins configured, all authentication decisions are referred to the central service at the time they're made. When the site ARGUS is configured to use the EGI/NGI emergency user suspension policies, any centrally suspended user DN will be automatically blocked from accessing the site's services.

However, DPM does it's own authentication and maintains its own list of banned DNs, so rather than referring each decision to the site ARGUS, we need a specific tool to update DPM's view based on the site ARGUS server. Just to complicate matters further, DPM's packages live in the Fedora EPEL repository, which means that they cannot depend on the ARGUS client libraries, which do not.

The solution is the very small 'dpm-argus' package which is available from the EMI3 repositories for both SL5 and SL6; a package dependency bug has prevented its installation in the past, but this has been fixed as of EMI3 Update 19. It should be installed on the DPM head node (if installing manually rather than with yum, you'll also need the argus-pep-api-c package from EMI) and contains two files, the 'dpns-arguspoll' binary, and its manual page.

Running the tool is simple - it needs a 'resource string' to identify itself to the ARGUS server (for normal purposes it doesn't actually matter what it is) and the URL for the site ARGUS:
dpns-arguspoll my_resource_id https://argus.example.org:8154/authz
when run, it will iterate over the DNs known to the DPM, check each one against the ARGUS server, and update the DPM banning state accordingly. All that remains is to run it periodically. At Oxford we have an '/etc/cron.hourly/dpm-argus' script that simply looks like this:
#!/bin/sh
# Sync DPM's internal user banning states from argus

export DPNS_HOST=t2se01.physics.ox.ac.uk
dpns-arguspoll dpm_argleflargle https://t2argus04.physics.ox.ac.uk:8154/authz 2>/dev/null
And that's it. If you want to be able to see the current list of DNs that your DPM server considers to be banned, then you can query the head node database directly:
echo "SELECT username from Cns_userinfo WHERE banned = 1;" | mysql -u dpminfo -p cns_db
At the moment that should show you my test DN, and probably nothing else.

23 July 2014

IPv6 and XrootD 4

Xrootd version 4 has recently been released. As QMUL is involved in IPv6 testing, and as this new release now supports IPv6, I thought I ought to test it.  So,  what does this involve?

  1. Set up a dual stack virtual machine - our deployment system now makes this relatively easy. 
  2. Install xrootd. QMUL is a StoRM/Lustre site, and has an existing xrootd server that is part of Atlas's  FAX (Federated ATLAS storage systems using XRootD), so it's just a matter of configuring a new machine to export our posix storage in much the same way.  In fact, I've done it slightly differently as I'm also testing ARGUS authentication, but that's something for another blog post. 
  3. Test it - the difficult bit...
I decided to test it using CERN's dual stack lxplus machine: lxplus-ipv6.cern.ch.

First, I tested that I'd got FAX set up correctly:

setupATLAS
localSetupFAX
voms-proxy-init --voms atlas
testFAX


All 3 tests were successful, so I've got FAX working, next configure it to use my test machine:

export STORAGEPREFIX=root://xrootd02.esc.qmul.ac.uk:1094/
testFAX


Which also gave 3 successful tests out of 3. Finally, to prove that downloading files works, and that it isn't just redirection that works, I tested a file that should only be at QMUL:

xrdcp -d 1 root://xrootd02.esc.qmul.ac.uk:1094//atlas/rucio/user/ivukotic:user.ivukotic.xrootd.uki-lt2-qmul-1M -> /dev/null 

All of these reported that they were successful. Were they using IPv6 though? Well looking at Xrootd's logs, it certainly thinks so - at least for some connections, though some still seem to be using IPv4:

140723 16:03:47 18291 XrootdXeq: cwalker.19073:26@lxplus0063.cern.ch pub IPv6 login as atlas027
140723 16:04:01 18271 XrootdXeq: cwalker.20147:27@lxplus0063.cern.ch pub IPv4 login as atlas027
140723 16:04:29 23892 XrootdXeq: cwalker.20189:26@lxplus0063.cern.ch pub IPv6 login as atlas027


Progress!!!

30 June 2014

Thank you for making a simple compliance test very happy

Rob and I had a look at the gstat tests for RAL's CASTOR. For a good while now we have had a number of errors/warnings raised. They did not affect production: so what are they?

Each error message has a bit of text associated with it, saying typically "something is incompatible with something else" - like an "access control base rule" (ACBR) is incorrect, or tape published not consistent with type of Storage Element (SE). The ACBR error arises due to legacy attributes being published alongside the modern ones, and the latter complains about CASTOR presenting itself as tape store (via a particular SE)

So what is going on?  Well, the (only) way to find out is to locate the test script and find out what exactly it is querying. In this case, it is a python script running LDAP queries, and luckily it can be found in CERN's source code repositories. (How did we find it in this repository? Why, by using a search engine, of course.)

Ah, splendid, so by checking the Documentation™ (also known as "source code" to some), we discover that it needs all ACBRs to be "correct" (not just one for each area) and the legacy ones need an extra slash on the VO value, and an SE with no tape pools should call itself "disk" even if it sits on a tape store.

So it's essentially test driven development: to make the final warnings go away, we need to read the code that is validating it, to engineer the LDIF to make the validation errors go away.

09 June 2014

How much of a small file problem do we have...An update

So as an update to my previous post "How much of a small file problem do we have..."; I decided to have a look at a single part of the namespace within the storage element at the tier1 rather than a single disk server. (The WLCG VOs know this as a scope or family etc.)
When analysing for ATLAS ( if you remember this was the VO I was personally mostly worried about due to the large number of small files; I achieved the following numbers:


Total number of files          3670322
Total number of log files    109025
Volume of log files             4.254TB
Volume of all files              590.731TB
The log files  represent ~29.7% of the files within the scope, so perhaps the disk server I picked was enriched with log files compared to the average.
What is worrying is that this 30% of files is only reponsible for  0.7% of the disk space used ( 4.254TB out of a total 590.731TB).
The mean filesize of the log files is 3.9MB and the median filesize is 2.3MB. Also the log files size varies from 6kB to 10GB;  so some processes within the VO  do seem to be able to create large log files. If one were to remove the log files from the space; then the files mean size would increase from 161MB to 227MB ;  and the median filesize would increase from 22.87MB to 45.63MB.


07 May 2014

Public research, open data

RAL hosted a meeting for research councils, other public bodies, and industry participants, on open data, organised with the Big Innovation Centre (we will have a link once the presentations have been uploaded).

As you know, research councils in the UK have data policies which say

  • Publicly funded data must be made public
  • Data can be embargoed - even if publicly funded, it will be protected for a period of time, to enable you to get your results, write your papers, achieve world domination. You know, usual stuff.
  • Data should be usable.
  • The people who produced the data should be credited for the work - in other words, the data should be cited, as you would cite a publication with results that you use or refer to.
All of these are quite challenging (of this more anon), but interestingly some of the other data publishers had to even train (external) people to use their data. Would you say data is open not just when it is usable, but also actually being used? Certainly makes the policies even more challenging. The next step beyond that would be that the data actually has a measurable economic impact.

You might ask: so what use is the high energy physics (HEP) data, older data, or LHC data such as that held by GridPP, to the general public?  But that is the wrong question, because you don't know what use it is till someone's got it and looked at it. If we can't see an application of the data today - someone else might see it, or we might see one tomorrow.  And the applications of HEP tend to come after some time: when neutrons were discovered, no one knew what they were good for; today they are used in almost all areas of science. Accelerators used in the early days of physics have led to the ones we use today in physics, but also to the ones used in healthcare. What good will come of the LHC data?  Who knows. HEP has the potential to have a huge impact - if you're patient...

24 April 2014

How much of a small file problem do we have...

Here at the Tier1 at RAL-LCG2; we have been draining disk servers with a fury (achieving over 800MB/s on a 10G NIC machine.) Well we get that rate on some servers with large files;  but machines with small files achieve a lower rate, but how many small files do we have and is there a VO dependency... So I decided to look at our three largest LCG VOs.
In tabula form; here is the analysis so far:


VO  LHCb CMS ATLAS ATLAS ATLAS
sub section All All All non-Log files Log files
# Files 16305 14717 396887 181799 215088
Size (TB) 37.565 39.599 37.564 35.501 2.062
# Files >  10 GB 1 24 75 75 0
# Files >     1GB 8526 11902 9683 9657 26
# Files < 100MB 4434 2330 3E+06 134137 3E+06
# Files <  10MB 2200 569 265464 68792 196672
# Files <    1MB 1429 294 85190 20587 64603
# Files <  100kB 243 91 6693 2124 4569
# Files <    10kB 6 13 635 156 479
Ave Filesize (GB) 2.30 2.69 0.0946 0.195 0.00959
% space used by files > 1GB 96.71 79.73 64.56





Now what I find interesting is how similar values LHCb and CMS are with each other, even though they are vastly different VOs. What worries me is that over 50% of ATLAS files are less than 10MB. Now just to find a tier2 to do a similar analysis to see if it just a T1 issue.....

01 April 2014

Dell OpenManage for disk servers

As we've been telling everyone who'll listen, we at Oxford are big fans of the Dell 12-bay disk servers for grid storage (previously R510 units, now R720xd ones). A few people have now bought them and asked about monitoring them.

Dell's tools all go by the general 'OpenManage' branding, which covers a great range of things, including various general purpose GUI tools. However, for the disk servers, we generally go for a minimal command-line install.

Dell have the necessary bits available in a YUM-able repository as described on the Dell Linux wiki. Our setup simple involves:
  • Installing the repository file,
  • yum install srvadmin-storageservices srvadmin-omcommon,
  • service dataeng start
  • and finally logging out and back in again, or otherwise picking up the PATH variable change from the newly installed srvadmin-path.sh script in /etc/profile.d
At that point, you should be able to query the state of your array with the 'omreport' tool, for example:
# omreport storage vdisk controller=0
List of Virtual Disks on Controller PERC H710P Mini (Embedded)

Controller PERC H710P Mini (Embedded)
ID                            : 0
Status                        : Ok
Name                          : VDos
State                         : Ready
Hot Spare Policy violated     : Not Assigned
Encrypted                     : No
Layout                        : RAID-6
Size                          : 100.00 GB (107374182400 bytes)
Associated Fluid Cache State  : Not Applicable
Device Name                   : /dev/sda
Bus Protocol                  : SATA
Media                         : HDD
Read Policy                   : Adaptive Read Ahead
Write Policy                  : Write Back
Cache Policy                  : Not Applicable
Stripe Element Size           : 64 KB
Disk Cache Policy             : Enabled
We also have a rough and ready Nagios plugin which simply checks that each physical disk reports as 'OK' and 'Online' and complains if anything else is reported.

31 March 2014

Highlights of ISGC 2014

ISGC 2014 is over. Lots of interesting discussions - on the infrastructure end, ASGC developing fanless machine room, interest in (and results on) CEPH and GLUSTER, dCache tutorial, and an hour of code with the DIRAC tutorial.

All countries and regions presented overviews of their work in e-/cyber-Infrastructure.

Interestingly, although this wasn't a HEP conference, practically everyone is doing >0 on LHC, so the LHC really is binding countries and researchers (well, at least physicist and infrastructureists) and e-Infrastructures together (and NRENs). When one day, someone sits down to tally up the benefit and impact of the LHC, this ought to be one of the top ones. The ability to work together and to (mostly) be able to move data to each other, and to trust each other's CAs.

Regarding the DIRAC tutorial, I was there and went through as much as I could ("I am not doing that to my private key")  Something to play with a bit more when I have time - an hour (of code) is not much time; there are always compromises between getting stuff done realistically and cheating in tutorials, but as long as there's something you can take away and play with later. As regards the key shenanigans, DIRAC say they will be working with EGI on SSO, so that's promising. Got the T-shirt, too. "Interware," though?

On the security side, OSG have been interfacing to DigiCert, following the planned termination of the ESNET CA. Once again grids have demands that are not seen in the commercial world, such as the need for bulk certificates (particularly cost effective ones - something a traditional Classic IGTF can do fairly well.) Other security questions (techie acronym alert, until end of paragraph) include how Argus and XACML compare for implementing security policies, and the EMI STS - CERN looking at linking with ADFS. And Malaysia are trialling an online CA based on a FIPS level three token with a Raspberry π.

EGI federated cloud got mentioned quite a few times - KISTI interested in offering IaaS, also Australia interested in joining. Philippines providing resources. EGI have a strategy for engagement. Interesting the extent to which they are driving the of CDMI.

I should mention Shaun gave a talk on "federated" access to data, comparing the protocols - which I missed - the talk, I mean - being in another session, but I understand it was well received and there was a lot of interest.

Software development - interesting experiences from the dCache team and building user communities with (for) DIRAC. How are people taught to develop code? The closing session was by Adam Lyon from Fermilab who talked about the lessons learned - the HEP vision of big data being different from the industry one. And yet HEP needs a culture shift to move away from the not-invented-here.

ISGC really had a great mix of Asian and European countries, as well as the US and Australia. This post was just a quick look through my notes; there'll be much more to pick up and ponder over the coming months. And I haven't even mentioned the actual science stuff ...

Storage thoughts from GRIDPP32

Last week saw me successfully talk about the planned CEPH installation at the RAL Tier1. Here is a list of other thoughts which came up form GRIDPP32:

ATLAS and CMS plans for Run2 of the LHC seems to have an increase in churn rate of data at their Tier2s which will lead to a higher deletion rate being needed. Also will need to look at making sure dark data is discovered and deleted in a more timely manner.

A method for discovering and deleting empty directories which are no longer needed needs to be created. As an example at the Tier1, there are currently 1071 ATLAS users , each of whom can create  up to 131072 sub-directories which can end up being dark directories under ATLAS's new RUCIO namespace convention.

To help with deletion, some of the bulk tools the site admins can use are impressive (but also possible hazardous.) One small typo when deleting may lead to huge unintentional data loss!!!

Data rates shown  by Imperial college  of over 30Gbps WAN traffic are impressive (and makes me want to make a comparison between all UK sites to see  what rates have been recorded via the WLCG monitoring pages.

Wahid Bhimji's  storage talk also got me thinking again that with the rise of the WLCG VO's  FAX/AAA systems and their relative increase in usage; perhaps it is time to re-investigate WAN tuning not only of WN's at sites but also of XROOT proxy servers used by the VOs. In addition, I am still worried about monitoring and controlling the number of xrootd connections per disk server in each of the type's of SE  which we have deployed on the WLCG.

I was also interested to see his work using DAV and its possible usefulness for smaller VOs.
 

27 March 2014

dCache workshop at (with) ISGC 2014

Shaun and I took part in the dCache workshop. Starting with a VM with a dCache RPM, the challenge was to set it up with two pools, NFS4, and WebDAV. A second VM got to access the data, mainly via NFS or HTTP(S) - security ranged from IP address to X.509 certificates. The overall impression was that it was pretty easy to get set up and configure the interfaces and get it to do something useful: dCache is not "an SRM" or "an NFS server" but rather storage middleware which provides a wide range of interfaces to storage. One of the things the dCache team is looking into is the cloud interface, via CDMI. This particular interface is not ready (as of March 2014) for production, but it's something we may want to look into and test with the EGI FC's version, Stoxy.

05 March 2014

Some thoughts on "normal" HTTP clients and Grid authorisation

In thinking-out-loud mode. Grid clients use certificates: generally this enhances security as you get mutual authentication. So to present authorisation attributes, these either have to be carried with the credential, or queried otherwise via a callout (or cached locally). Access control is generally performed at the resource.

For authorisation attributes we tend to use VOMS,using attribute certificates. These are embedded inside the Globus proxy certificate, which is a temporary client certificate created (signed by) the user certificate, and "decorated" with the authorisation stuff - this makes sense: it separates authentication from authorisation. Globus proxies, however, tend not to work with "standard" HTTP clients, like browsers (which is not HTTP's fault, but a feature of the secure sockets.)

VOMS is nice because you get group membership and can choose to optionally assert rôles. The user selection is often missing in many authorisation schemes which either present all your attributes or none (or give you no choice at all.)

So how would we get grid authorisation working with "standard" HTTP clients?  One way is to do what SARoNGS does: get a temporary certificate and decorate that instead. The client doesn't manage the certificate directly, but grants access to it, like GlobusOnline does, either by giving GO access to your MyProxy server, either by giving it your MyProxy username/password (!), or using OAuth.

If, instead, you want to have your own certificate in the browser (or other user agent), then authorisation could be done in one of two ways: you can have the resource call out to an external authorisation server, saying "I am resource W and I've got user X trying to do action Y on item Z" and the authorisation server must then look up the actual authorisation attributes of user X and take a decision.  XACML could work here, with (in XACMLese) the resource being the PEP, the authorisation server the PDP, and the authorisation database (here, a VOMS database) being the PIP. VOMS also supports a SAML format, allegedly, but if it does, it's rarely seen in the wild.

Or, you could use OAuth directly. If do an HTTP GET on a protected URL, presenting a client certificate, the user agent would be directed to the authorisation server to which it would authenticate using the user certificate. The authorisation server would then need hooks to (a) find the relevant authorisation attributes in VOMS, and (b) take the decision based on the requested URL. The catch is that the OAuth client (the user agent) would need to present a client id to the authorisation server - a shared secret. Also the resource would need a means of validating the Access Token which is generally opaque. Hm. It's easy to see that something much like OAuth could work but it would obviously be better to use an existing protocol.

There are other things one could try, taking a more pure SAML approach, using some of the SAML and web services federation stuff.

Somebody may of course already have done this, but it would be interesting to do some experiments and maybe summarise the state of the art.

25 February 2014

Big data picture

Not as in ((big data) picture) but (big (data picture)), if that makes sense.

I find myself in Edinburgh - it's been far too long since I last was here, I am embarrassed to say.

We are looking at data movement for EUDAT and PRACE, and by a natural extension (being at EPCC), GridPP and DiRAC. The main common data mover is GridFTP: useful because we can (more or less) all move data with GridFTP, it gets great performance, we know how to tune and monitor it, and it supports third party copying. We also need to see how to bridge in GlobusOnline, with the new delegated credentials. In fact both Contrail and NCSA developed OAuth-delegated certificates (and while the original CILogon work was OAuth1, the new stuff is OAuth2.)

One use case is data sharing (the link is to a nice little video Adam Carter from EPCC showed in the introduction). You might argue that users are not jumping up and down screaming for interdisciplinary collaborations, yet if they were possible they might happen! When data policies require data be made available, as a researcher producing data you really have no choice: your data must be shareable with other communities.

21 February 2014

TF-Storage meeting

A quick summary of the Terena TF-Storage meeting earlier this month. Having been on the mailing list for ages, it was good to attend in person - and to catch up with friends from SWITCH and Cybera.

Now there was a lot of talk about cloudy storage, particularly OpenStack's SWIFT and CINDER, as, respectively, object and block stores. At some point when I have a spare moment (haha) I will see if I can get them running in the cloud. I asked about CDMI support for SWIFT but it has not been touched in a while - it'd be a good thing to have, though (so we can use it with other stuff). Also software defined networking (SDN) got attention; it has been talked about for a while but seems to be maturing. Their product used to be called Quantum and is now called Neutron (thanks to Joe from Cybera for the link) There was talk about OpenStack and CEPH, with work by Maciej Brzeźniak from PSNC being presented.

There's an interesting difference between people doing very clever things with their erasure codes and secrecy schemes, and the rest of us who tend to just replicate, replicate, replicate. If you look at the WLCG stuff, we tend to not do clever things - the middleware stack is already complicated enough - but just create enough replicas, and control the process fairly rigidly.

There was a discussion about identity management, of course, which mostly reiterated stuff we did for the grid about ten years ago - which led to VOMS and suchlike.

The report triggered a discussion as to whether the grid is a distributed object store.  It kind of is. 

03 February 2014

Setting up an IPv6 only DPM

As part of the general IPv6 testing work, we've just installed a small, single (virtual) node DPM at Oxford that's exclusively available over IPv6. While many client tools will prefer IPv6 to IPv4 given the choice,
some things will prefer IPv4, even if they could work over IPv6, and others might not be able to work over IPv6 at all. Having a dedicated IPv6 only testing target such as this simplifies tests - if something works at all, you know it's definitely doing it over IPv6.

The process was fairly straightforward, with a few minor catches:
  • In the YAIM config, DPM_DB_HOST is set to localhost rather than the FQDN - MySQL is IPv4 only, and if you have it try to use the machines full name, it will try to look up an IPv4 address, and fail when there isn't one.
  • The setting 'BDII_IPV6_SUPPORT=yes' is enabled to make the DPM's node BDII listen on IPv6. This is also required on the site BDII if you want it to do the same, and seems to be completely harmless when set on v4 only nodes. In any case the site BDII will need some degree of IPv6 capability so that it can connect to the DPM server.
  • YAIM requires the 'hostname -f' command to return the machines fully qualified domain name, which it will only do if the name is properly resolvable. Unfortunately, the default behaviour only attempts to look up an IPv4 address record, and so fails. It's possible to fix this cleanly by adding the option 'options inet6' as a line in /etc/resolve.conf, e.g:
    search physics.ox.ac.uk
    nameserver 2001:630:441:905::fa
    options inet6
    
  • Socket binding. For reasons that are better explained here, /etc/gai.conf needs to be set to something like:
    label ::/0 0
    label 0.0.0.0/0 1
    precedence ::/0 40
    precedence 0.0.0.0/0 10
    
    to get some services that don't explicitly bind to IPv6 addresses as well as IPv4 to get both by default.
And then YAIM it as per normal.

In addition to getting the DPM itself running, there are some sundry support services that are needed or helpful for any IPv6 only system (since it won't be able to use services that are only accessible via
IPv4). In the Oxford case, I've installed:
  • A dual stack DNS resolver to proxy DNS requests to the University's DNS servers,
  • A squid proxy to enable access to IPv4-only web services (like the EMI software repositories),
  • A dual stack site BDII. Advertising the DPM server requires the site BDII to be able to connect to it to pick up its information. That means an IPv6 capable site BDII.
The final product is named 't2dpm1-v6.physics.ox.ac.uk', and it (currently) offers support for the ops, dteam and atlas VOs, and should be accessible from any IPv6 capable grid client system.

Its been a while, but my family dynamics are changing....

Due to housing and room capacity; ATLAS decided to reduce the number of centrally controlled clones. So here is an update for where Georgina, Eve and I are now living after ATLAS's change of policy:

New Values
DataSet Name
Dave G'gina Eve
"DNA" Number
Number of "Houses" 49 70 54
Type of Rooms:DATADISK
17 37 33
Type of Rooms:LGD
32 58 24
Type of Rooms:PERF+PHYS
29 56 24
Type of Rooms:TAPE
7 12 9
Type of Rooms:USERDISK
0 1 5
Type of Rooms:CERN
8 10 10
Type of Rooms:SCRATCH
3 6 1
Type of Rooms:CALIB 0 4 7
Total number of people (including clones) 1090 1392 471
Number of unique people 876 1019 293
Numer of "people" of type:
^user 136 368 97
Numer of unique "people" of type:
^user 131 340 83
Numer of "people" of type:
^data 919 950 352
Numer of unique "people" of type:
^data 719 616 189
Numer of "people" of type:
^group 34 74 22
Numer of unique "people" of type:
^group 25 63 21
Numer of "people" of type:
^valid 1 0 0
Numer of unique "people" of type:
^valid 1 0 0
 Datasets that have 1 copy 696 763 184
 Datasets that have 2 copies 146 184 62
 Datasets that have 3 copies 34 44 33
 Datasets that have 4 copies 0 16 9
 Datasets that have 5 copies 0 7 4
 Datasets that have 6 copies 0 5 0
 Datasets that have 7 copies 0 0 0
 Datasets that have 8 copies 0 0 1
 Datasets that have 12 copies 0 0 0
 Datasets that have 13 copies 0 0 0
Number of files that have  1 copy 56029 134672 14260
Number of files that have  2 copies 8602 35191 6582
Number of files that have  3 copies 1502 1751 5924
Number of files that have  4 copies 0 1879 75
Number of files that have  5 copies 0 607 868
Number of files that have  6 copies 0 306 0
Number of files that have  7 copies 0 0 0
Number of files that have  8 copies 0 0 1
Number of files that have  12 copies 0 0 0
Number of files that have  13 copies 0 0 0
Total number of files on the grid: 77739 222694 49844
Total number of unique files: 66133 174406 27710
Data Volume (TB) that has  1 copy 9.175 30.389 4.127
Data Volume (TB) that has  2 copies 6.361 21.62 3.294
Data Volume (TB) that has  3 copies 0.223 2.452 7.812
Data Volume (TB) that has  4 copies 0 1.263 0.14
Data Volume (TB) that has  5 copies 0 1.408 0.17
Data Volume (TB) that has  6 copies 0 0.379 0
Data Volume (TB) that has  7 copies 0 0 0
Data Volume (TB) that has  8 copies 0 0 0.001
Data Volume (TB) that has  12 copies 0 0 0
Data Volume (TB) that has  13 copies 0 0 0
Total Volume of data on the grid (TB): 22.57 95.351 35.57
Total Volume of unique data (TB): 15.76 57.511 15.54



The difference in values from my last update are:


Difference
DataSet Name
D' G' E'
"DNA" Number


Number of "Houses" -10 -9 -9
Type of Rooms:DATADISK
-11 -12 -17
Type of Rooms:LGD
-5 -3 10
Type of Rooms:PERF+PHYS
-3 -2 2
Type of Rooms:TAPE
1 0 0
Type of Rooms:USERDISK
-1 -8 0
Type of Rooms:CERN
5 4 5
Type of Rooms:SCRATCH
3 -6 -13
Type of Rooms:CALIB 0 -1 0
Total number of people (including clones) -76 -202 -171
Number of unique people -18 -101 -6
Numer of "people" of type:
^user -1 -102 33
Numer of unique "people" of type:
^user -1 -89 28
Numer of "people" of type:
^data 194 -98 -186
Numer of unique "people" of type:
^data 187 -15 -23
Numer of "people" of type:
^group 3 4 -18
Numer of unique "people" of type:
^group -1 9 -11
Numer of "people" of type:
^valid 0 0 0
Numer of unique "people" of type:
^valid 0 0 0
 Datasets that have 1 copy 5 -48 6
 Datasets that have 2 copies 3 -13 -1
 Datasets that have 3 copies -18 -24 8
 Datasets that have 4 copies -71 -7 2
 Datasets that have 5 copies -1 0 -17
 Datasets that have 6 copies 0 0 -2
 Datasets that have 7 copies 0 -2 -1
 Datasets that have 8 copies 0 -1 0
 Datasets that have 12 copies 0 0 -6
 Datasets that have 13 copies 0 0 -3
Number of files that have  1 copy 2385 -6991 4751
Number of files that have  2 copies -4302 -1672 -1123
Number of files that have  3 copies -358 -1957 1404
Number of files that have  4 copies -7 -213 -35
Number of files that have  5 copies -7 35 -223
Number of files that have  6 copies 0 -181 -20
Number of files that have  7 copies 0 -142 -5
Number of files that have  8 copies 0 -73 0
Number of files that have  12 copies 0 0 -6
Number of files that have  13 copies 0 0 -5
Total number of files on the grid: -7356 -19547 26871
Total number of unique files: -2289 -11194 -16964
Data Volume (TB) that has  1 copy 0.375 2.689 2.427
Data Volume (TB) that has  2 copies -0.439 0.32 -1.406
Data Volume (TB) that has  3 copies -0.077 -3.148 0.612
Data Volume (TB) that has  4 copies -0.011 -0.237 0.02
Data Volume (TB) that has  5 copies -0.028 0.008 -0.3
Data Volume (TB) that has  6 copies 0 0.019 -0.039
Data Volume (TB) that has  7 copies 0 -0.13 -0.001
Data Volume (TB) that has  8 copies 0 -0.12 0
Data Volume (TB) that has  12 copies 0 0 -0.001
Data Volume (TB) that has  13 copies 0 0 -0.001
Total Volume of data on the grid (TB): -1.134 -8.649 -0.231
Total Volume of unique data (TB): -0.241 -0.489 1.344


The number of unique children in all three families (44TB/205k files) are completely unique to the world and are at risk to a single disk failure in a room. 

23 December 2013

Good talks at DPM workshop

It's nice to see plans and  issues for DPM sites are similar around the world. Here are my musings:

The fact Australia and Taiwan sites see the approximate percentage of dark files after ATLAS RUCIO renamed their site as I have seen in the UK (8-9%) is encouraging to know that the UK is not especially bad; (now just need to work with ATLAS on how to efficiently clean up these files, delete empty directories; and how to reduce dark data creation in the future.) It will be interesting to see if other VO's have a lower or higher percentage of dark data.

Also good to hear that a puppet deployment of DPM (rather than YAIM) is almost complete for usage and now I have a better understanding of the development cycle of the individual components; I am less worried about the move away from a single product release.



11 December 2013

Off to Edinburgh for DPM Workshop 2013

Friday has me going to Edinburgh for the latest DPM workshop. It'll be good to meet and discuss all the new(ish) features DPM has to offer (and make my own suggestions...)

Meeting agenda should be available at the following indico meeting page:

http://indico.cern.ch/conferenceDisplay.py?confId=273864