GridPP storage news: April 2014

24 April 2014

How much of a small file problem do we have...

Here at the Tier1 at RAL-LCG2; we have been draining disk servers with a fury (achieving over 800MB/s on a 10G NIC machine.) Well we get that rate on some servers with large files; but machines with small files achieve a lower rate, but how many small files do we have and is there a VO dependency... So I decided to look at our three largest LCG VOs.
In tabula form; here is the analysis so far:

VO	LHCb	CMS	ATLAS	ATLAS	ATLAS
sub section	All	All	All	non-Log files	Log files
# Files	16305	14717	396887	181799	215088
Size (TB)	37.565	39.599	37.564	35.501	2.062
# Files > 10 GB	1	24	75	75	0
# Files > 1GB	8526	11902	9683	9657	26
# Files < 100MB	4434	2330	3E+06	134137	3E+06
# Files < 10MB	2200	569	265464	68792	196672
# Files < 1MB	1429	294	85190	20587	64603
# Files < 100kB	243	91	6693	2124	4569
# Files < 10kB	6	13	635	156	479
Ave Filesize (GB)	2.30	2.69	0.0946	0.195	0.00959
% space used by files > 1GB	96.71	79.73	64.56

Now what I find interesting is how similar values LHCb and CMS are with each other, even though they are vastly different VOs. What worries me is that over 50% of ATLAS files are less than 10MB. Now just to find a tier2 to do a similar analysis to see if it just a T1 issue.....

01 April 2014

Dell OpenManage for disk servers

As we've been telling everyone who'll listen, we at Oxford are big fans of the Dell 12-bay disk servers for grid storage (previously R510 units, now R720xd ones). A few people have now bought them and asked about monitoring them.

Dell's tools all go by the general 'OpenManage' branding, which covers a great range of things, including various general purpose GUI tools. However, for the disk servers, we generally go for a minimal command-line install.

Dell have the necessary bits available in a YUM-able repository as described on the Dell Linux wiki. Our setup simple involves:

Installing the repository file,
yum install srvadmin-storageservices srvadmin-omcommon,
service dataeng start
and finally logging out and back in again, or otherwise picking up the PATH variable change from the newly installed srvadmin-path.sh script in /etc/profile.d

At that point, you should be able to query the state of your array with the 'omreport' tool, for example:

# omreport storage vdisk controller=0
List of Virtual Disks on Controller PERC H710P Mini (Embedded)

Controller PERC H710P Mini (Embedded)
ID                            : 0
Status                        : Ok
Name                          : VDos
State                         : Ready
Hot Spare Policy violated     : Not Assigned
Encrypted                     : No
Layout                        : RAID-6
Size                          : 100.00 GB (107374182400 bytes)
Associated Fluid Cache State  : Not Applicable
Device Name                   : /dev/sda
Bus Protocol                  : SATA
Media                         : HDD
Read Policy                   : Adaptive Read Ahead
Write Policy                  : Write Back
Cache Policy                  : Not Applicable
Stripe Element Size           : 64 KB
Disk Cache Policy             : Enabled

We also have a rough and ready Nagios plugin which simply checks that each physical disk reports as 'OK' and 'Online' and complains if anything else is reported.

GridPP storage news

24 April 2014

How much of a small file problem do we have...

01 April 2014

Dell OpenManage for disk servers

Current SRM versions

GridPP storage availability

Label Cloud

Links

Contributors

Blog Archive

GridPP storage availability