26 April 2018

Impact of Firmware updates of Spectra and Meltdown Mitigation.


In order to address the security issues associated with the Spectra / Meltdown hardware bug found in many modern operating system AND CPUs firmware, CPU microcode updates are required. The microcode updates addresses the Spectre variant 2 attack. Spectre variant 2 attacks work by persuading a processor's branch predictor to make a specific bad prediction about which code will be executed and from which information can be obtained about the process.

Much has been said about the performance impact of Spectra / meltdown mitigation caused by the kernel patches. Less is known about the impact of the firmware updates on system performance. Most of the concern is about the performance impact on processes that switch between user and system calls. These are typically applications that perform disk or network operations.

After one abortive attempt Intel has released a new set of CPU microcode updates that promise to provide stability (https://newsroom.intel.com/wp-content/uploads/sites/11/2018/03/microcode-update-guidance.pdf). We have run some IO intensive benchmarks tests on our servers testing different firmware on our Intel Haswell CPUs (E5 2600 V3).

Our test setup up is made up of 3 HPE DL60 servers each with one OS disk and three data disks (1 TB SATA hard drives). One node is used for control while the other two will be involved in the actual benchmark process. The servers have Intel E5 2650 V3 CPUs and 128GB of RAM. Each server is connected at 10Gb/s SFP+ to a non blocking switch. All system are running scientific linux 6.9 (aka CentOS 6.9) with all the latests updates installed.

The manufacture, HPE, has provided a BIOS update which will deploy this new microbe version and we will investigate the impact of updating the microcode to 0x3C(BIOS 2.52) from previous version 0x3A(2.56) while keeping everything else constant. One nice feature of the HPE servers is the ability to swap to a backup BIOS so updates can be reverted.


Our first test uses a HDFS test called DFSIO with a  Hadoop setup (1 name node, 2 data nodes with 3 data disks each). The test will write 1TB of data across the 6 disks and then reads it back. The command run are

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar TestDFSIO -D
test.build.data=mytestdfsio -write -nrFiles 1000 -fileSize 1000
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar TestDFSIO -D
test.build.data=mytestdfsio -read -nrFiles 1000 -fileSize 1000


The results, in minutes taken, clearly show a major performance impact, of order 20%, in using the new microcode update! 

As a cross check we did a similar test using IOzone. Here we used the distributed mode of IOzone to run tests on the six disks of the two data nodes. The command run was 
iozone -+m clustre.cfg -r 4096k -s 85g -i 0 -i 1 -t 12 1TB, 12 threads, were clustre.cfg defines the nodes and disks used.


The results, in kb/s throughput, again show a measurable impact in performance of using the new firmware, although at a smaller scale (5%).

Instead of using local idisk (direct attached storage) we also did the test over the network, using our Lustre file system instead of the local disks, we saw no performance impact in either test however in this case the 10Gb/s link was a bottle neck and may have influenced the results. We will investigate further as time allows.

No comments: