07 March 2018

LHCOPN/ONE meeting 2018 at Abingdon UK---Or how the NRENs want to taake over the storage world

OK, So my title may be full of poetic license, but I have fortunately been able to attend the WLCG LHCOPN/ONE meeting at Abingdon UK this week and wanted to find a way to get your attention. It might be easy to ask why would a storage blog be interested in a networking conference; but if you don't take into account how to transfer the data and make sure the network is efficient for the data movement process we will be in trouble. Remember, with no "N" , all we do are WA/LA transfers.  (what's WA/LA? Exactly!!).

The agenda for the meeting  can be found here:
 The meeting was attended by ~30 experts from both WLCG and more importantly from NREN network providers such as JISC/GEANT/ESNET/SURFnet  (but not limited to these organisations.)

My highlights maybe subjective. and people should feel free to delve into the slides if they wish, (if the have access.) Here however are my highlights and musings:

From site updates. RAL-LCG will be joining LHCONE, BNL and FNAL are connected, or almost connected at 300G with a 400G transatlantic link to Europe. At the moment , the majority of T1-T1 traffic is still using the OPN rather than ONE. However for RAL a back of the envelope calculation shows switching for our connections to US T1s will reduce are latency by 12 and 35 % respectively to either BNL  and FNAL so could be a benefit.
Data volume increases are slowing , there was only a 65% increase in rate on LHCONE in the last year compared to the 100% seen in each of the previous two years.

Following the site update was an interesting ipv6 talk where my work looking at perfSONAR rates between ipv4 and ipv6 comparisons was referenced.  (See previous blogpost.) It was also stated again that the next version  of perfSONAR (4.1) will not be on SL/RHEL 6 and will only be available on 7.

There was an interesting talk on the new DUNE neutrino project and its possible usage of either LHCOPN or ONE .

The day ended for me on a productive discussion to agree that jumbo frames should be highly encouraged/recommended. (but make sure PMTUD is on!)

Day 2   was slightly more network techy side and some parts I have to admit were lost on me, However there were interesting talks regarding DTNs and open storage Networks. Plus topics about demonstrators which were shown at SuperComputing. A rate of 378Gbps memory to memory is not to be sniffed at! How far NRENs want to become persistent storage providers is a question
I would ask myself. However I can see how OSN and SDNs could do to dataI/O workflows what the creation of collimators did to allow radio telescope inter-ferometry to flourish.

23 February 2018

Lisa, the new sister to "Dave the dataset" makes her appearane.

Hello  I 'm Lisa, similar to "Dave the dataset" but born in 2017 in the ATLAS experiment . my DNA number is 2.16.2251. My initial size pf my 23 sub sections is 60.8TB 33651 files. My Main physics subsection is 8.73TB (4726 files). I was born on 9 months ago, in that time I have now produced 1281 unique children corresponding to 129.4TB of data in 60904 files. It i snot surprising that I have a large number of children as I am still relatively new and my children have yet been culled.

It is interesting to see for a relatively new dataset, how many copies of myself and my children there are.
There is 46273 files/ 60.248TB with 1 copy, 35807 files/ 62.06TB with 2 copies, 2959 files/ 4.94TB with 3 copies, 9110 files/ 2.16TB with 4 copies, 51 files/ 0.017GB with 5 copies and 80files/ 0.44GB with 6 copies. Only four real scientist have data which doesn't have a second copy

Analyzing  how distributed around the world this data is shows the data is in  100 rooms in total across 67 houses.

Of course more data sets are just about to be created with the imminent restart of the LHC , so we will see how I and new datasets distributions develop.

21 February 2018

Dave's Locations in preparation for 2018 data taking.

My powers that be are just about to create more brethren of me in their big circular tunnel. SO I though I would give an update of my locations.

There are currently 479 rooms over 145 houses used by ATLAS. My data is still 8 years on still in 46 rooms in 24 houses. There are 269 individuals. of which 212 are unique , 56 have a twin in another room and one is a triplet. In total this means 13GB of data has double redundancy,  5.48TB has a single redundancy, and 2.45TB has no redundancy.  Of note is that 5.28TB of the 7.93TB of data with a twin is from the original produced data.

My main concern is not with those "Dirk" or "Gavin" who are sole children, as they can easily be reproduced in the children "production" factories. Of concern are the 53 "Ursulas" with no redundancy. This equates to 159GB of data/ 6671 files of whose lose would effect 17 real scientists.

06 February 2018

ZFS 0.7.6 release

ZFS on Linux 0.7.6 has now landed.


For everyone running the 0.7.0-0.7.5 builds I would encourage people to look into updating as there are a few performance fixes associated with this build.
Large storage servers tend to have ample hardware, however if you're running this on systems with a small amount of RAM then the fixes may have a dramatic performance improvement.
Anecdotally I've also seen some improvements on a system which hosts a large number of smaller files which could be due to some fixes around the ZFS cache.

What if an update goes wrong?

I'm linking a draft of a flowchart I'm still working on to help debug what to do if a ZFS filesystem has disappeared after rebooting a machine:
https://drive.google.com/file/d/1hqY_qTfdpo-g_qApcP9nSknIm8X3wMwo/view?usp=sharing(Download and view offline for best results, there's a few things to check for!)

24 January 2018

Can we see Improvement in IPV6 perfSONAR traffic for the RAL-LCG2 Tier1 site?

In three words (or my TL;DR response would be,) Yes and No.
You may remember I made an analysis of the perfSONAR rates for both IPv4 and IPV6 traffic from the WLCG Tier1 at Rutherford Appleton Laboratory to other WLCG sites. Here is a quick update for new measurements is a plot showing current perfSONAR rates for sites from 8 months ago, their new results, and values for new sites which have been IPV6 enabled and included in official WLCG mesh testing.
IPv4 vs IPv6perfSONAR throughput rates between RAL and other WLCG sites.
What I find interesting is that we still have some sites which have vastly better IPv4 rates rather than IPv6. NB we have 16 sites still with data, 5 sites with no current results and 10 new sites which have been added since the last tranche of measurements. 

18 January 2018

Dual-stacking Lancaster's SL6 DPM

I'll start with the caveat that this isn't an interesting tale, but then all the happy sysadmin stories are of the form “I just did this, and it worked!”.

Before we tried to dual-stack our DPM we had all the necessary IPv6 infrastructure provided and set up for us by the Lancaster Information System Services team. Our DNS was v6 ready, DHCPv6 had been set up and we had a IPv6 allocation to our subnet. We tested that these services were working on our Perfsonar boxes, so there were no surprises there. When the time came to dual-stack, all we needed to do was request IPv6 addresses for our headnode and pool nodes. It's worth noting that you can run partially dual-stacked without error – we ran with a handful of poolnodes dual-stacked. However I would advise that when the time comes to dual-stack your headnode you do all of your disk pools at the same time.

Once the IPv6 addresses came through and the DNS was updated (with dig returning AAAA records for all our DPM machines) the dual-stacking process was as simple as adding these lines to the network script for our external interfaces (for example ifcfg-eth0):


And then restarting the network interface, and the DPM services on that node (although we probably only needed to restart dpm-gsiftp). We also of course needed a v6 firewall, so we created a ip6tables firewall that just had all the DPM transfer ports (gsiftp, xrootd, https) open. Luckily the ip6tables syntax is the same as that for iptables, so there wasn't anything new to learn there.

Despite successfully running test by hand we found out all FTS transfers were failing with errors like:

CGSI-gSOAP running on fts-test01.gridpp.rl.ac.uk reports could not open connection to fal-pygrid-30.lancs.ac.uk:8446

Initial flailing had me add this line that was missing from /etc/hosts:

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

But the fix came after reading a similar thread in the dpm users forum pointing to problems with /etc/gai.conf – a config that I had never heard of before and typically didn't exist or was empty in the average linux installation. In order for globus to work with IPv6 it had to be filled with what is for all intents and purposes an arcane incantation:

# cat /etc/gai.conf
label ::1/128 0
label ::/0 1
label 2002::/16 2
label ::/96 3
label ::ffff:0:0/96 4
label fec0::/10 5
label fc00::/7 6
label 2001:0::/32 7
label ::ffff:7f00:0001/128 8

It's important to note that this is a problem that only effects SL6, RHEL7 installs of DPM should not need it. Filling /etc/gai.conf with the above and then restarting dpm-gsiftp on all our nodes (headnode and disk pools) fixed the issue and FTS transfers started passing again, although we still had transfer failures occurring at quite a high rate.

The final piece of the puzzle was the v6 firewall – remember how I said we opened up all the transfer protocol ports? It appears DPM likes to talk to itself over ipv6, so we had to open up our ip6tables firewall ports a lot more on our head node to bring it more in line with our v4 iptables. Once this was done and the firewall restarted our DPM started running like a dual-stacked dream, and we haven't had any problems since. A happy ending just in time for Christmas!

07 November 2017

IPV6 slowly encroaching into Data storage and transfers within the GridPP UK community for WLCG experiments.

Its the elephant in the room. We know IPv6 is around the corner. (Some would say say its three corners back and we have just been ignoring it.) However, within the UK, GridPP is making progress.
More and more sites are offering fully dual stacked storage systems. Indeed some sites are already fully dual-stacked.

Other sites are slowly working in IPv6 components into current systems.

As a group we are looking at new gateway technologies to span IPV4 back ends to have have IPV4/6 front ends. Here I am thinking of the work RALPP site has been working with in collaboration with CMS for xrootd Proxy Caching.

Even the FTS middleware deployed at the UK Tier1 is just about to be deployed dual-stacked. It is an interesting time  for IPV6 within the UK and not in the Chinese proverbial sense.

These are just some of the storage related highlights/current activities for IPV6 integration. I leave it as an exercise of the reader regarding other grid components.