26 April 2018

Impact of Firmware updates of Spectra and Meltdown Mitigation.

In order to address the security issues associated with the Spectra / Meltdown hardware bug found in many modern operating system AND CPUs firmware, CPU microcode updates are required. The microcode updates addresses the Spectre variant 2 attack. Spectre variant 2 attacks work by persuading a processor's branch predictor to make a specific bad prediction about which code will be executed and from which information can be obtained about the process.

Much has been said about the performance impact of Spectra / meltdown mitigation caused by the kernel patches. Less is known about the impact of the firmware updates on system performance. Most of the concern is about the performance impact on processes that switch between user and system calls. These are typically applications that perform disk or network operations.

After one abortive attempt Intel has released a new set of CPU microcode updates that promise to provide stability (https://newsroom.intel.com/wp-content/uploads/sites/11/2018/03/microcode-update-guidance.pdf). We have run some IO intensive benchmarks tests on our servers testing different firmware on our Intel Haswell CPUs (E5 2600 V3).

Our test setup up is made up of 3 HPE DL60 servers each with one OS disk and three data disks (1 TB SATA hard drives). One node is used for control while the other two will be involved in the actual benchmark process. The servers have Intel E5 2650 V3 CPUs and 128GB of RAM. Each server is connected at 10Gb/s SFP+ to a non blocking switch. All system are running scientific linux 6.9 (aka CentOS 6.9) with all the latests updates installed.

The manufacture, HPE, has provided a BIOS update which will deploy this new microbe version and we will investigate the impact of updating the microcode to 0x3C(BIOS 2.52) from previous version 0x3A(2.56) while keeping everything else constant. One nice feature of the HPE servers is the ability to swap to a backup BIOS so updates can be reverted.

Our first test uses a HDFS test called DFSIO with a  Hadoop setup (1 name node, 2 data nodes with 3 data disks each). The test will write 1TB of data across the 6 disks and then reads it back. The command run are

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar TestDFSIO -D
test.build.data=mytestdfsio -write -nrFiles 1000 -fileSize 1000
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar TestDFSIO -D
test.build.data=mytestdfsio -read -nrFiles 1000 -fileSize 1000

The results, in minutes taken, clearly show a major performance impact, of order 20%, in using the new microcode update! 

As a cross check we did a similar test using IOzone. Here we used the distributed mode of IOzone to run tests on the six disks of the two data nodes. The command run was 
iozone -+m clustre.cfg -r 4096k -s 85g -i 0 -i 1 -t 12 1TB, 12 threads, were clustre.cfg defines the nodes and disks used.

The results, in kb/s throughput, again show a measurable impact in performance of using the new firmware, although at a smaller scale (5%).

Instead of using local idisk (direct attached storage) we also did the test over the network, using our Lustre file system instead of the local disks, we saw no performance impact in either test however in this case the 10Gb/s link was a bottle neck and may have influenced the results. We will investigate further as time allows.

13 April 2018

Data rates in the UK for last 12 months: Wow alot of data goes bentween the WNs and SEs...

So with me move to new data and work-flow models as a result of the idea to create further storageless sites and caching sites. I decided to take a look at how large data flows  within the UK. Caveat Emptor: I took this data from ATLAS dashboard and make the assumption that there is little WAN traffic from WNs to SEs. I am aware this is not correct, but is at the moment a small factor for ATLAS. (Hence why I reviewed ATLAS rather than CMS whom I know use AAA alot.)

In green are the data volumes in UK disk storage, in red are the rates out of the storage. (In blue is the rate for WAN transfers between UK SEs.) In purple shows the rates to and from the RAL tape system. Of notes is that during the month there was 49.1PB of data deleted from UK storage out of a disk cache of ~33PB. What I note form these rate that the 139PB of data ingest from storage into worker nodes  and the 11.4PB out from the completed jobs is data that would have had to go on the WAN if WNs were not co-located with SEs.

3 Meetings, 2 talk topic areas, 1 blogpost: Storage travels this month in a nutshell.

Been  a busy month for meetings with the WLCG/HSF joint meeting  in Naples, GRIDPP40  at Pitlochry and GDB at CERN. I summarized at the gridpp storage meeting the WLCG/HSF meeting, I expanded with my talk at GRIDPP on eudatalake project. (goo.gl/uvVtdm ).
But overall if you need a summary then these two talks from the GDB are the way to summarize most areas. Short links are: