Since ZFS is the most advanced system in that respect, ZFS on Linux was tested for that purpose and proved to be a good choice here too.
This post will describe the general read/write and failure tests, and a later post will describe additional tests like rebuilding of the raid if a disk fails, different failure scenarios, setup and format times.
Please, use the comment section if you would like to have other tests done too.
harware test configuration:
- DELL PowerEdge R510
- 12x2TB SAS (6Gbps) internal storage on a PERC H700 controller
- 2 external MD1200 devices with 12x2TB SAS (6Gbps)on a PERC H800 controller
- 24GB RAM
- 2 x Intel Xeon E5620 (2.4GHz)
- for all settings in the raid controllers the default was used for all tests, except for cache which was set to "write through"
ZFS test system configuration:
- SL6 OS
- ZFS based on the latest version available in the repository
- no ZFS compression used
- 1xraidz2 + hotspare for all the disks on H700 (zpool tank)
- 1xraidz2 + hotspare for all the disks on H800 (zpool tank800)
- in both raid controllers each disk is defined as a single raid0 since they don't support JBOD, unfortunately
Hardware raid test system configuration:
- same machine with same disks, controllers, and OS used as for the ZFS test configuration
- 1xraid6 + hotspare for all the disks on H700
- 1xraid6 + hotspare for all the disks on H800
- space was divided into 8TB partitions and formatted with ext4
Read/Write speed test
H700 results
ZFS based:
write: 236MB/s, 1min:02 (165MB/s)
read: 399MB/s, 0min:27 (379MB/s)
Hardware raid based:
write: 233MB/s, 1min:10 (146MB/s)
read: 1.2GB/s, 0min:18 (1138MB/s)
H800 results
ZFS based:
write: 619MB/s, 0min:23 (445MB/s)
read: 2.0GB/s, 0min:05 (2048MB/s)
Hardware raid based:
write: 223MB/s, 1min:13 (140MB/s)
read: 150MB/s, 1min:12 (142MB/s)
H700 and H800 mixed
- 6 disks from each controller were used together in a combined raid configuration
- this kind of configuration is not possible for a hardware based raid
write: 723MB/s, 0min:37 (277MB/s)
read: 577MB/s, 0min:18 (568MB/s)
Conclusion
- ZFS rates for H800 based raid much better than hardware raid based system
- the large difference between ZFS and hardware raid based reads needs more investigation
- for repeating the same tests 2 more times it was at the same order, however
- H800 has a much better performance than H700 when using ZFS, but not for the hardware raid configuration
Failure Test
Here it was tested what happens if a 100GB file (test.tar) is copied (cp and rsync) from the H800 based raid to the H700 based raid and during this copy the system failed, simulated by cold reboot through the remote console.
ZFS result:
root@pool6 ~]# ls -lah /tank
total 46G
drwxr-xr-x. 2 root root 5 Mar 19 20:11 .
dr-xr-xr-x. 26 root root 4.0K Mar 19 20:17 ..
-rw-r--r--. 1 root root 16G Mar 19 19:07 test10G
-rw-r--r--. 1 root root 13G Mar 19 20:12 test.tar
-rw-------. 1 root root 18G Mar 19 20:06 .test.tar.EM379W
[root@pool6 ~]# df -h /tank
Filesystem Size Used Avail Use% Mounted on
tank 16T 46G 16T 1% /tank
[root@pool6 ~]# du -sch /tank
46G /tank
46G total
[root@pool6 ~]# rm /tank/*test.tar*
rm: remove regular file `/tank/test.tar'? y
rm: remove regular file `/tank/.test.tar.EM379W'? y
[root@pool6 ~]# du -sch /tank
17G /tank
17G total
[root@pool6 ~]# ls -la /tank
total 16778239
drwxr-xr-x. 2 root root 3 Mar 19 20:21 .
dr-xr-xr-x. 26 root root 4096 Mar 19 20:17 ..
-rw-r--r--. 1 root root 17179869184 Mar 19 19:07 test10G
- everything consistent
- no file check needed at reboot
- no problems at all occurred
Hardware raid based result:
[root@pool7 gridstorage02]# ls -lhrt
total 1.9G
drwx------ 2 root root 16K Jun 26 2012 lost+found
drwxrwx--- 91 dpmmgr dpmmgr 4.0K Feb 4 2013 ildg
-rw-r--r-- 1 root root 0 Mar 6 2013 thisisgridstor2
drwxrwx--- 98 dpmmgr dpmmgr 4.0K Aug 8 2013 lhcb
drwxrwx--- 609 dpmmgr dpmmgr 20K Aug 27 2014 cms
drwxrwx--- 6 dpmmgr dpmmgr 4.0K Nov 23 2014 ops
drwxrwx--- 6 dpmmgr dpmmgr 4.0K Mar 13 12:18 ilc
drwxrwx--- 9 dpmmgr dpmmgr 4.0K Mar 13 23:04 lsst
drwxrwx--- 138 dpmmgr dpmmgr 4.0K Mar 14 10:23 dteam
drwxrwx--- 1288 dpmmgr dpmmgr 36K Mar 15 00:00 atlas
-rw-r--r-- 1 root root 1.9G Mar 18 17:11 test.tar
[root@pool7 gridstorage02]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sdb2 8.1T 214M 8.1T 1% /mnt/gridstorage02
[root@pool7 gridstorage02]# du . -sch
1.9G .
1.9G total
[root@pool7 gridstorage02]# rm test.tar
rm: remove regular file `test.tar'? y
[root@pool7 gridstorage02]# du . -sch
41M .
41M total
[root@pool7 gridstorage02]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sdb2 8.1T -1.7G 8.1T 0% /mnt/gridstorage02
- Hardware raid based tests were done first, on a machine that was previously used as dpm client, therefore the directory structure was left, but empty
- during the reboot a file system check was done
- "df" reports a different number for the used space than "du" and "ls"
- after removing the file, the used space reported by "df" is negative
- file system is not consistent anymore
Conclusion here:
- for the planned extension (17x2TB exchanged for 8TB disks), the new disks should be placed in the MD devices and managed by the H800 using ZFS
- second zpool can be used for all remaining 2TB disks (on H700 and H800 together)
- ZFS seems to handle system failures better
To be continued...
1 comment:
You really should run tests with files much bigger than RAM size, else caching will get in the way and make the results irrelevant. You have 24G of RAM, you should run your tests with 48G fileS.
You may also run "echo 3 > /proc/sys/vm/drop_cache" to empty the file cache between runs for more consistent results.
Another point to take into account is the IO scheduler. Most distributions use cfq (Completely Fair Scheduler) as a default, unfortunately it's most of the time a poor choice for a server, particularly when using hardware RAID. Use "noop" scheduler for perfectly fair tests: run "echo noop > /sys/block//queue/scheduler" for all drives.
Last you may need to adjust IO queue length and read ahead. Default values are quite correct for old ATA drives with small cache, but very suboptimal for RAID arrays. Most RAID controller needs much longer queues than the default 128 (512 or 1024): echo 1024 > /sys/block//queue/nr_requests
And most hardware RAID controller have large caches that give better results with sequential IO with big read-ahead values: echo 8192 > /sys/block:/queue/read_ahead_kb
Post a Comment