I'll start with the
caveat that this isn't an interesting tale, but then all the happy
sysadmin stories are of the form “I just did this, and it worked!”.
Before we tried to
dual-stack our DPM we had all the necessary IPv6 infrastructure
provided and set up for us by the Lancaster Information System
Services team. Our DNS was v6 ready, DHCPv6 had been set up and we
had a IPv6 allocation to our subnet. We tested that these services
were working on our Perfsonar boxes, so there were no surprises
there. When the time came to dual-stack, all we needed to do was
request IPv6 addresses for our headnode and pool nodes. It's worth
noting that you can run partially dual-stacked without error – we
ran with a handful of poolnodes dual-stacked. However I would advise
that when the time comes to dual-stack your headnode you do all
of your disk pools at the same time.
Once
the IPv6 addresses came through and the DNS was updated (with dig
returning AAAA records for all our DPM machines) the dual-stacking
process was as simple as adding these lines to the network script for
our external interfaces (for example ifcfg-eth0):
IPV6INIT=yes
DHCPV6C=yes
And
then restarting the network interface, and the DPM services on that
node (although we probably only needed to restart dpm-gsiftp). We
also of course needed a v6 firewall, so we created a ip6tables
firewall that just had all the DPM transfer ports (gsiftp, xrootd,
https) open. Luckily the ip6tables syntax is the same as that for
iptables, so there wasn't anything new to learn there.
Despite
successfully running test by hand we found out all FTS transfers were
failing with errors like:
CGSI-gSOAP
running on fts-test01.gridpp.rl.ac.uk reports could not open
connection to fal-pygrid-30.lancs.ac.uk:8446
Initial
flailing had me add this line that was missing from /etc/hosts:
::1
localhost localhost.localdomain localhost6 localhost6.localdomain6
But
the fix came after reading a similar thread in the dpm users forum
pointing to problems with /etc/gai.conf – a config
that I had never heard of before and typically didn't exist or was
empty in the average linux installation. In order for globus to work
with IPv6 it had to be filled with what is for all intents and
purposes an arcane incantation:
#
cat /etc/gai.conf
label
::1/128 0
label
::/0 1
label
2002::/16 2
label
::/96 3
label
::ffff:0:0/96 4
label
fec0::/10 5
label
fc00::/7 6
label
2001:0::/32 7
label
::ffff:7f00:0001/128 8
It's
important to note that this is a problem that only effects SL6, RHEL7
installs of DPM should not need it. Filling /etc/gai.conf with the
above and then restarting dpm-gsiftp on all our nodes (headnode and
disk pools) fixed the issue and FTS transfers started passing again,
although we still had transfer failures occurring at quite a high
rate.
The
final piece of the puzzle was the v6 firewall – remember how I said
we opened up all the transfer protocol ports? It appears DPM likes to
talk to itself over ipv6,
so we had to open up our
ip6tables firewall ports a lot more on our head node to bring it more
in line with our v4 iptables. Once this was done and the firewall
restarted our DPM started running like a dual-stacked dream, and we
haven't had any problems since. A happy ending just in time for Christmas!
No comments:
Post a Comment