Of course, for us, the relevant latency and bandwidth is the pure storage-to-storage measurement, but this is limited by the lower/upper bound (respectively) of the site's own latency and bandwidth. We already have a system which measures this bounding limit, in the PerfSonar network which all WLCG sites have had installed for some time.
Whilst PerfSonar sites do record the (one-way) packet latency to all of their peers, the display of this doesn't seem to be a "standard" visualisation from the central repository for PerfSonar. So, I spent a few hours pulling data from the UK's sites with public PerfSonar services, and making a spreadsheet. (Doing it this way also means that I can make my own "averages" - using a 10% truncated mean to remove outliers - rather than the "standard" mean used by PerfSonar itself.)
The result, in raw form, looks like this (where, for comparison, I also added the ballpark latencies for access to spinning media, solid state storage (via SATA), and solid state storage (via PCI-E).
All of these results are for IPv4 packets: for some sites, it looks like switching transport protocol to IPv6 has very significant effects on the numbers!
| 
GLA | 
BHAM | 
SUSX | 
ECDF | 
RALPP | 
LIV | 
UCL | 
RAL | 
DUR  | 
RHUL | 
QMUL | 
LANCS | 
CAM | 
OX | 
MAN | |
| 
GLA | 
x | 
2.5 | 
5.5 | 
3.9 | 
1.4 | 
3.6 | 
4.8 | 
2.6 | 
4.1 | 
- | 
0.7 | 
3.5 | 
4.9 | ||
| 
BHAM | 
4.4 | 
x | 
3.6 | 
3.7 | 
1.2 | 
1.5 | 
2.5 | 
3.0 | 
2.7 | 
- | 
4.7 | 
2.2 | 
2.9 | ||
| 
SUSX | 
8.1 | 
4.0 | 
x | 
7.2 | 
5.2 | 
1.8 | 
3.1 | 
7.0 | 
3.0 | 
- | 
8.9 | 
6.2 | 
3.2 | ||
| 
ECDF | 
6.7 | 
5.0 | 
7.6 | 
x | 
4.0 | 
6.0 | 
7.0 | 
3.6 | 
5.3 | 
- | 
7.8 | 
4.3 | 
7.5 | ||
| 
RALPP | 
7.0 | 
- | 
- | 
- | 
x | 
3.4 | 
- | 
0.1 | 
6.0 | 
1.4 | 
- | 
7.0 | 
- | 
1.6 | |
| 
LIV | 
4.4 | 
2.0 | 
5.3 | 
3.6 | 
x | 
3.6 | 
4.7 | 
- | 
5.0 | 
- | 
4.8 | 
4.2 | 
4.5 | ||
| 
UCL | 
6.5 | 
2.5 | 
2.5 | 
3.8 | 
3.9 | 
x | 
1.6 | 
3.8 | 
1.0 | 
- | 
6.8 | 
3.5 | 
1.6 | ||
| 
RAL | 
7.0 | 
2.5 | 
2.5 | 
5.5 | 
3.4 | 
0.7 | 
x | 
5.0 | 
1.8 | 
- | 
7.7 | 
4.5 | 
1.6 | ||
| 
DUR | 
5.0 | 
3.3 | 
6.5 | 
2.7 | 
2.7 | 
5.3 | 
6.0 | 
x | 
4.5 | 
- | 
5.5 | 
3.8 | 
6.1 | ||
| 
RHUL | 
7.0 | 
2.7 | 
2.2 | 
6.3 | 
3.6 | 
0.5 | 
1.6 | 
3.0 | 
x | 
- | 
6.8 | 
3.5 | 
1.6 | ||
| 
QMUL | 
6.0 | 
2.3 | 
2.0 | 
3.9 | 
3.6 | 
0.4 | 
1.2 | 
4.0 | 
0.5 | 
x | 
6.6 | 
3.3 | 
1.2 | ||
| 
LANCS | 
3.2 | 
5.3 | 
8.5 | 
6.8 | 
4.5 | 
7.1 | 
7.8 | 
5.7 | 
7.1 | 
- | 
x | 
6.4 | 
7.9 | ||
| 
CAM | 
5.3 | 
2.0 | 
5.3 | 
3.2 | 
3.1 | 
3.5 | 
4.2 | 
3.1 | 
2.9 | 
- | 
5.9 | 
x | 
4.5 | ||
| 
OX | 
7.2 | 
2.9 | 
2.5 | 
6.7 | 
4.2 | 
0.9 | 
2.0 | 
6.0 | 
1.8 | 
- | 
7.6 | 
5.0 | 
x | ||
| 
MAN | 
4.1 | ||||||||||||||
| 
Spinning media | 
2 to 20 ms | 
SSD | 
0.2ms | 
nVme | 
0.06 millisecond | ||||||||||
The first take-away is that these latencies are all really very good - the largest value is still less than 10ms, which is exceptional. There's still measurable, consistent, variation in the latencies, though, so we can construct an adjacency graph from the data, using NetworkX and a force-directed layout (with the Kamada Kawai algorithm) to visualise:
|  | 
| Kamada-Kawai (force-directed) layout of the UK sites with public latency measurements (minus MAN), with very close clusters annotated. | 
As you can see, this reproduces the JANET networking structure - rather than the geographical distance - with, for example, Lancaster further away from Liverpool than Glasgow is, because Lancaster's packets actually pass through Glasgow, before routing back down south.
The next thing to do is to test the point-to-point SE latency for test examples.
 
 
 
 
1 comment:
Post a Comment