28 February 2019

Fixing an iptables problem with DPM at Edinburgh


After spending some time examining the output from iptables on our dpm servers in Edinburgh I came across a small problem combining our iptables rules with SRM.

For brevity the iptables rule which caused the problems is:

 *filter  
 :INPUT ACCEPT [0:0]  
 -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT  
 -A INPUT -i lo -j ACCEPT  
 -A INPUT -p icmp --icmp-type any -j ACCEPT  
 ...  
 -A INPUT -p tcp -m multiport --dports 8446 -m comment --comment "allow srmv2.2" -m state --state NEW -j ACCEPT  
 ...  

The problem caused by this is that packets look similar to the following in logs:

 IN=eth0 OUT= MAC=aa:bb:cc:dd:ee:ff:gg:hh:ii:jj:kk:ll:mm:nn SRC=192.168.12.34 DST=192.168.123.45 LEN=52 TOS=0x00 PREC=0x00 TTL=47 ID=19246 DF PROTO=TCP SPT=55012 DPT=8446 WINDOW=27313 RES=0x00 ACK FIN URGP=0

Here, ACK FIN shows how the dropped packet appears to be associated with closing a connection which iptables has already seen as closed.
(This is the case at least when with the DPM 1.10 srmv2.2 builds on both the latest security SL6 and CentOS7 kernels)

In Edinburgh we historically had problems with many connections which don't appear to close correctly, in particular with the SRM protocol. The service would if uncorrected run for several hours and then appear to hang not accepting any further connections.

We now suspect that this dropping of packets was potentially causing the issues we were seeing.

In order to fix this the above rule should either be changed to:
 ...  
 -A INPUT -p tcp -m multiport --dports 8446 -m comment --comment "allow srmv2.2" -m state --state NEW,INVALID -j ACCEPT  
 ...  

or, the state module shouldn't be used to filter only NEW packets associated with the srmv2.2 protocol.

With this in mind we've now removed the firewall requirement that packets be NEW to be accepted by our srmv2.2 service. This has enjoyed an active uptime of several days without hanging and refusing further connections.

An advatage of this is most of the rejected packets by the firewall of our DPM head node were actually associated with this rule. Now that the number of packets being rejected by our firewall has dropped significantly examining connections which are rejected for further patterns/problems becomes much easier.

18 February 2019

Understanding Globus connect/online... is it doing a lot??

I have made further progress in understanding Globus transfer tool (one thing I still struggle with is what to call it...) What I know I still need to understand is its authentication and authorization mechanisms. Of interest (to me at least) was to look at the usage of our Globus endpoints at RAL. 20TB in last 3 months. Now to work out if that is a lot or not compared to other  similar Globus endpoints and or other communities...

07 February 2019

PerfSonar, Network Latencies and Grid Storage

One of the future developments mooted under the DOMA umbrella of working groups has been distribution of storage. Doing this well requires understanding how "close" our various sites are, in network terms, so we can intelligently distribute. (For example: modern Erasure Coding resilience provides "tiered" successions of parity blocks, which provide for locating some parity "close" (in network terms) to subsets of the stripe; and other parity in a "global" pool with no locality assumptions.)

Of course, for us, the relevant latency and bandwidth is the pure storage-to-storage measurement, but this is limited by the lower/upper bound (respectively) of the site's own latency and bandwidth. We already have a system which measures this bounding limit, in the PerfSonar network which all WLCG sites have had installed for some time.

Whilst PerfSonar sites do record the (one-way) packet latency to all of their peers, the display of this doesn't seem to be a "standard" visualisation from the central repository for PerfSonar. So, I spent a few hours pulling data from the UK's sites with public PerfSonar services, and making a spreadsheet. (Doing it this way also means that I can make my own "averages" - using a 10% truncated mean to remove outliers - rather than the "standard" mean used by PerfSonar itself.)
The result, in raw form, looks like this (where, for comparison, I also added the ballpark latencies for access to spinning media, solid state storage (via SATA), and solid state storage (via PCI-E).
All of these results are for IPv4 packets: for some sites, it looks like switching transport protocol to IPv6 has very significant effects on the numbers!

GLA
BHAM
SUSX
ECDF
RALPP
LIV
UCL
RAL
DUR 
RHUL
QMUL
LANCS
CAM
OX
MAN
GLA
x
2.5
5.5
3.9

1.4
3.6
4.8
2.6
4.1
-
0.7
3.5
4.9

BHAM
4.4
x
3.6
3.7

1.2
1.5
2.5
3.0
2.7
-
4.7
2.2
2.9

SUSX
8.1
4.0
x
7.2

5.2
1.8
3.1
7.0
3.0
-
8.9
6.2
3.2

ECDF
6.7
5.0
7.6
x

4.0
6.0
7.0
3.6
5.3
-
7.8
4.3
7.5

RALPP
7.0
-
-
-
x
3.4
-
0.1
6.0
1.4
-
7.0
-
1.6

LIV
4.4
2.0
5.3
3.6

x
3.6
4.7
-
5.0
-
4.8
4.2
4.5

UCL
6.5
2.5
2.5
3.8

3.9
x
1.6
3.8
1.0
-
6.8
3.5
1.6

RAL
7.0
2.5
2.5
5.5

3.4
0.7
x
5.0
1.8
-
7.7
4.5
1.6

DUR
5.0
3.3
6.5
2.7

2.7
5.3
6.0
x
4.5
-
5.5
3.8
6.1

RHUL
7.0
2.7
2.2
6.3

3.6
0.5
1.6
3.0
x
-
6.8
3.5
1.6

QMUL
6.0
2.3
2.0
3.9

3.6
0.4
1.2
4.0
0.5
x
6.6
3.3
1.2

LANCS
3.2
5.3
8.5
6.8

4.5
7.1
7.8
5.7
7.1
-
x
6.4
7.9

CAM
5.3
2.0
5.3
3.2

3.1
3.5
4.2
3.1
2.9
-
5.9
x
4.5

OX
7.2
2.9
2.5
6.7

4.2
0.9
2.0
6.0
1.8
-
7.6
5.0
x

MAN









4.1























































Spinning media
2 to 20 ms

SSD
0.2ms

nVme
0.06 millisecond























The first take-away is that these latencies are all really very good - the largest value is still less than 10ms, which is exceptional. There's still measurable, consistent, variation in the latencies, though, so we can construct an adjacency graph from the data, using NetworkX and a force-directed layout (with the Kamada Kawai algorithm) to visualise:

Kamada-Kawai (force-directed) layout of the UK sites with public latency measurements (minus MAN), with very close clusters annotated.


As you can see, this reproduces the JANET networking structure - rather than the geographical distance - with, for example, Lancaster further away from Liverpool than Glasgow is, because Lancaster's packets actually pass through Glasgow, before routing back down south.

The next thing to do is to test the point-to-point SE latency for test examples.