18 January 2018

Dual-stacking Lancaster's SL6 DPM


I'll start with the caveat that this isn't an interesting tale, but then all the happy sysadmin stories are of the form “I just did this, and it worked!”.

Before we tried to dual-stack our DPM we had all the necessary IPv6 infrastructure provided and set up for us by the Lancaster Information System Services team. Our DNS was v6 ready, DHCPv6 had been set up and we had a IPv6 allocation to our subnet. We tested that these services were working on our Perfsonar boxes, so there were no surprises there. When the time came to dual-stack, all we needed to do was request IPv6 addresses for our headnode and pool nodes. It's worth noting that you can run partially dual-stacked without error – we ran with a handful of poolnodes dual-stacked. However I would advise that when the time comes to dual-stack your headnode you do all of your disk pools at the same time.

Once the IPv6 addresses came through and the DNS was updated (with dig returning AAAA records for all our DPM machines) the dual-stacking process was as simple as adding these lines to the network script for our external interfaces (for example ifcfg-eth0):

IPV6INIT=yes
DHCPV6C=yes

And then restarting the network interface, and the DPM services on that node (although we probably only needed to restart dpm-gsiftp). We also of course needed a v6 firewall, so we created a ip6tables firewall that just had all the DPM transfer ports (gsiftp, xrootd, https) open. Luckily the ip6tables syntax is the same as that for iptables, so there wasn't anything new to learn there.

Despite successfully running test by hand we found out all FTS transfers were failing with errors like:

CGSI-gSOAP running on fts-test01.gridpp.rl.ac.uk reports could not open connection to fal-pygrid-30.lancs.ac.uk:8446

Initial flailing had me add this line that was missing from /etc/hosts:

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

But the fix came after reading a similar thread in the dpm users forum pointing to problems with /etc/gai.conf – a config that I had never heard of before and typically didn't exist or was empty in the average linux installation. In order for globus to work with IPv6 it had to be filled with what is for all intents and purposes an arcane incantation:

# cat /etc/gai.conf
label ::1/128 0
label ::/0 1
label 2002::/16 2
label ::/96 3
label ::ffff:0:0/96 4
label fec0::/10 5
label fc00::/7 6
label 2001:0::/32 7
label ::ffff:7f00:0001/128 8

It's important to note that this is a problem that only effects SL6, RHEL7 installs of DPM should not need it. Filling /etc/gai.conf with the above and then restarting dpm-gsiftp on all our nodes (headnode and disk pools) fixed the issue and FTS transfers started passing again, although we still had transfer failures occurring at quite a high rate.

The final piece of the puzzle was the v6 firewall – remember how I said we opened up all the transfer protocol ports? It appears DPM likes to talk to itself over ipv6, so we had to open up our ip6tables firewall ports a lot more on our head node to bring it more in line with our v4 iptables. Once this was done and the firewall restarted our DPM started running like a dual-stacked dream, and we haven't had any problems since. A happy ending just in time for Christmas!

No comments: