19 September 2016

Upgrading and Expanding Lustre Storage (part1)

At the Queen Mary Grid site we are now running a Lustre file system of over 3PB using the most recent release (2.8). Lustre is an open source, POSIX compatible, clustered file system presented to the Grid using the StoRM Storage Resource Manager. Over the next few posts I would like to describe the recent major upgrade of the Lustre file system. I will: 
  • Introduce Lustre and our history of use at the Queen Mary Grid site and then discuss the motivation and benefits of upgrading; 
  • Describe our hardware setup and the most important software configuration options; 
  • Go into the testing and performance tuning of the file system as seen on the file server and the lustre client; 
  • Finally I will outline the data migration procedure and real world performance we have seen.


The Queen Mary WLCG tier two site has successfully operated a reliable, high performance, efficient, budget oriented storage storage solution, utilising Lustre[1] StoRM [2] and xrootd [3], since 2010 [4,5]. 
Lustre is a open-source(GPL), POSIX compliant, parallel file system used in over half of the worlds Top 500 supercomputers. Lustre is made up of three components: One or more Meta Data Servers (MDS) connected to one or more Meta Data Targets (MDT), which stores the namespace metadata such as filenames, directories and access permissions; One or more Object Storage Servers (OSS) connected to one or more Object Storage Targets (OST) which stores the actual files; and clients that access the data over the network using POSIX filesystem mounts. The network is typically either Ethernet or Infiniband.
StoRM (STOrage Resource Manager) is a scalable and file system independent storage manager service (SRM). It supports standard access and transfer protocols like HTTP(S), WebDAV and GridFTP. It it is designed to work on top of any POSIX filesystems with Access Control Lists(ACL) support such as Lustre.
Previously the Lustre storage file system at Queen Mary has undergone expansion from 300TB to 1.5PB, an upgrade of Lustre from version 1.6 to 1.8.X, a network upgrade from multiple 1Gb to 10Gb ethernet, and migration of the MDS and MDT to new hardware. This upgrade will involve new hardware, a complete reinstalation of every OS and Lustre software on every storage server (MDS/OSS) and a migration of data from the old Lustre to the new.

Motivation for Upgrade:

Last year it was decided that a major software and hardware upgrade was required. This was driven by several reasons: The need to upgrade the Operating system (OS) from Scientific Linux (SL)5 to a supported OS such as SL6 or CentOS7; Use a supported Lustre version compatible with SL6 or CentOS7; To take advantage of new software developments providing improved performance and reliability; Migrate to a new MDS/MDT with hardware in warranty; Double the storage capacity to over 3PB and allow for a doubling again before 2020.
Consideration was given to use of other open source file systems such as CEPH and GlusterFS. However, it was decided early on that local knowledge and experience with Lustre; its maturity, reliability and performance; clear long term development and support from Intel and others; and POSIX support made Lustre the obvious choice.
It is possible to buy a commercially supported solution but this was beyond the budget available. Therefore the specification, installation, configuration and operation of hardware and software had to 
be done by the site team.

next post: Hardware Choices and Software Setup

Some Useful References:

[1] Lustre:

[2] StoRM:

[3] XrootD:

[4] CHEP2012:
Scalable Petascale Storage for HEP using Lustre: Journal of Physics: C.J. Walker D.P. Traynor and A.J. Martin. Conference Series 396 (2012) 042063 

[5] CHEP2014:
Optimising network transfers to and from Queen Mary University of London, a large WLCG tier-2 grid site: C J Walker, D P Traynor, D T Rand, T S Froy and S L Lloyd. Journal of Physics: Conference Series 513 (2014) 062048 

No comments: