After spending some time examining the output from iptables on our dpm servers in Edinburgh I came across a small problem combining our iptables rules with SRM.
For brevity the iptables rule which caused the problems is:
*filter
:INPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p icmp --icmp-type any -j ACCEPT
...
-A INPUT -p tcp -m multiport --dports 8446 -m comment --comment "allow srmv2.2" -m state --state NEW -j ACCEPT
...
The problem caused by this is that packets look similar to the following in logs:
IN=eth0 OUT= MAC=aa:bb:cc:dd:ee:ff:gg:hh:ii:jj:kk:ll:mm:nn SRC=192.168.12.34 DST=192.168.123.45 LEN=52 TOS=0x00 PREC=0x00 TTL=47 ID=19246 DF PROTO=TCP SPT=55012 DPT=8446 WINDOW=27313 RES=0x00 ACK FIN URGP=0
Here,
ACK FIN
shows how the dropped packet appears to be associated with closing a connection which iptables has already seen as closed.(This is the case at least when with the DPM 1.10 srmv2.2 builds on both the latest security SL6 and CentOS7 kernels)
In Edinburgh we historically had problems with many connections which don't appear to close correctly, in particular with the SRM protocol. The service would if uncorrected run for several hours and then appear to hang not accepting any further connections.
We now suspect that this dropping of packets was potentially causing the issues we were seeing.
In order to fix this the above rule should either be changed to:
...
-A INPUT -p tcp -m multiport --dports 8446 -m comment --comment "allow srmv2.2" -m state --state NEW,INVALID -j ACCEPT
...
or, the state module shouldn't be used to filter only NEW packets associated with the srmv2.2 protocol.
With this in mind we've now removed the firewall requirement that packets be NEW to be accepted by our srmv2.2 service. This has enjoyed an active uptime of several days without hanging and refusing further connections.
An advatage of this is most of the rejected packets by the firewall of our DPM head node were actually associated with this rule. Now that the number of packets being rejected by our firewall has dropped significantly examining connections which are rejected for further patterns/problems becomes much easier.