GridPP storage news: dCache

Showing posts with label dCache. Show all posts

03 April 2017

Current storage scope within GridPP

With a new project year upon us, I decided to review which site's storage support is used used by the WLCG VOs, what SRMs are used and the file systems used. On that last part , we now don't just have filesystems but also object stores with the usage of CEPH. Other filesystems are XFS, ZFS, HDFS, Spectrum Scale (or the artist formerly known as GPFS), and Lustre.

In terms of Storage elements/systems we have DPM, dCache, Castor, classic SE, stand alone xrootd, and stand alone gsiftp services. When it comes to the regional T2s and who they are used by, the following helps.

I thought about embedding SE system into the font used for each site, but thought that was too much overlay of information.

31 March 2014

Highlights of ISGC 2014

ISGC 2014 is over. Lots of interesting discussions - on the infrastructure end, ASGC developing fanless machine room, interest in (and results on) CEPH and GLUSTER, dCache tutorial, and an hour of code with the DIRAC tutorial.

All countries and regions presented overviews of their work in e-/cyber-Infrastructure.

Interestingly, although this wasn't a HEP conference, practically everyone is doing >0 on LHC, so the LHC really is binding countries and researchers (well, at least physicist and infrastructureists) and e-Infrastructures together (and NRENs). When one day, someone sits down to tally up the benefit and impact of the LHC, this ought to be one of the top ones. The ability to work together and to (mostly) be able to move data to each other, and to trust each other's CAs.

Regarding the DIRAC tutorial, I was there and went through as much as I could ("I am not doing that to my private key") Something to play with a bit more when I have time - an hour (of code) is not much time; there are always compromises between getting stuff done realistically and cheating in tutorials, but as long as there's something you can take away and play with later. As regards the key shenanigans, DIRAC say they will be working with EGI on SSO, so that's promising. Got the T-shirt, too. "Interware," though?

On the security side, OSG have been interfacing to DigiCert, following the planned termination of the ESNET CA. Once again grids have demands that are not seen in the commercial world, such as the need for bulk certificates (particularly cost effective ones - something a traditional Classic IGTF can do fairly well.) Other security questions (techie acronym alert, until end of paragraph) include how Argus and XACML compare for implementing security policies, and the EMI STS - CERN looking at linking with ADFS. And Malaysia are trialling an online CA based on a FIPS level three token with a Raspberry π.

EGI federated cloud got mentioned quite a few times - KISTI interested in offering IaaS, also Australia interested in joining. Philippines providing resources. EGI have a strategy for engagement. Interesting the extent to which they are driving the of CDMI.

I should mention Shaun gave a talk on "federated" access to data, comparing the protocols - which I missed - the talk, I mean - being in another session, but I understand it was well received and there was a lot of interest.

Software development - interesting experiences from the dCache team and building user communities with (for) DIRAC. How are people taught to develop code? The closing session was by Adam Lyon from Fermilab who talked about the lessons learned - the HEP vision of big data being different from the industry one. And yet HEP needs a culture shift to move away from the not-invented-here.

ISGC really had a great mix of Asian and European countries, as well as the US and Australia. This post was just a quick look through my notes; there'll be much more to pick up and ponder over the coming months. And I haven't even mentioned the actual science stuff ...

27 March 2014

dCache workshop at (with) ISGC 2014

Shaun and I took part in the dCache workshop. Starting with a VM with a dCache RPM, the challenge was to set it up with two pools, NFS4, and WebDAV. A second VM got to access the data, mainly via NFS or HTTP(S) - security ranged from IP address to X.509 certificates. The overall impression was that it was pretty easy to get set up and configure the interfaces and get it to do something useful: dCache is not "an SRM" or "an NFS server" but rather storage middleware which provides a wide range of interfaces to storage. One of the things the dCache team is looking into is the cloud interface, via CDMI. This particular interface is not ready (as of March 2014) for production, but it's something we may want to look into and test with the EGI FC's version, Stoxy.

10 November 2008

Edinburgh dCache is dead. Long live DPM!

Today is a sad day in the world of Edinburgh storage. The long serving and (semi-)reliable dCache storage element has now been retired. It's had a hard life, with many ups and downs, particularly in it's youth. However, as it matured it grew into a dependable work horse for our local physics analyses. Unfortunately, the hardware is now old and creaking and the effects of the credit crunch have even managed to propagate all the way to the top of our ivory tower (i.e., electricity has gone up). These effects combined to force us to put dCache to sleep in a peaceful and humane manner at the end of last week.

However, never fear, all is not lost. We have been successfully using DPM for data access for many months now. We still have the occassional Griddy problem with it, but it is growing up to fill the void left behind by it's half brother. Long live DPM!

22 February 2008

dCache configuration, graphviz style

I don't know about anyone else, but I'm fed up having to try and debug different site's PoolManager.conf files, especially with all this LinkGroup stuff going on. I find it too too hard to manually parse a file when it stretches to 100's of lines, making it virtually impossible to know if there are any mistakes.

In an effort to try and improve the situation, I put together a little python script last night that converts a PoolManager.conf into a .dot file. This can then be processed by GraphViz to produce a structured graph of the dCache configuration. You can see some examples of currently active dCache configurations here. The above plot shows the config at Edinburgh.

I have been creating both directional (dot) and undirectional (neato) graphs. At the moment, the most useful one is the dot plot. I'm still exploring what neato can be used for.

I think the fact that we even have to consider looking at things this way tells you two things:

1. dCache is a complex beast, with a multitude of different ways of setting things up (which has both pros and cons).
2. The basic configuration really has to be improved to save multiple man-hours that are spent across the Grid trying to debug basic problems.

At the moment, this system is only a prototype. It is intended as an aide to understanding dCache configuration and looking for potential bugs. As always, comments are welcome.

PS Thanks to Steve T for inspiring me to work on this following his graphing glue project.

23 January 2008

CCRC confusion

So, which Tier-2s are involved in the February CCRC exercise? Does anyone know? What about the date that they will get involved? Do they need to have SRM2.2, or not? Some sources suggest they do, others suggest they don't. If anyone knows the answers to these questions, please email me. I think I'll start attending the daily meetings to find out what is going on.

Also, it turns out that although the 1.8.0-12 release of dCache was made sometime last week, it turns out that this is also the CCRC branch of dCache. Do you spot the difference? Good, then we can continue. This explains the weird naming convention that the dCache developers are using for all CCRC related releases, namely 1.8.0-12p1, 1.8.0-12p2 for patch versions 1 and 2 of the 1.8.0-12 branch. Hope that clears things up.

16 January 2008

Global dCache monitoring

It's been a while since my last post - been busy with some things that I will post about shortly. As you know, I've been running the GridPP SAM storage monitoring for the past few weeks. It looks like the dCache team got wind of this and have asked if I could set up something similar to summarise the SAM test results for all dCache sites in the WLCG information system. This is now done and the results can be seen above and at this new page in the dCache wiki:

http://trac.dcache.org/trac.cgi/wiki/MonitoringDcache

I don't seem to be able to get SAM results for the US Tier-2 sites, so I'll need to investigate this. The above link contains other useful information that I'll talk about shortly.

03 January 2008

dCache PostgreSQL monitoring

Happy New Year everyone!

Following a recommendation from the dCache developers, I set up some PostGreSQL monitoring using pgFouine. It is very simple to install and configure. The plots are generated automatically by the pgfouine tool in addition to giving a full breakdown of what queries took the longest time. This should be useful to let you understand what your database (and hence dCache) is doing. You can also use it to analyse the output of the VACUUM (FULL) ANALYSE command which should give you an idea as to how large the FSM should be set to (which is an important configuration parameter).

PS Fouine is French for stone marten, which is something like a weasel.

16 December 2007

Anyone for some dCache monitoring?

The above plots come from some new dCache monitoring that I have set up to study the behaviour of the Edinburgh production storage (srm.epcc.ed.ac.uk). This uses Brian Bockleman's GraphTool and some associated scripts to query the dCache billing database. You can find the full set of plots here (I know, it's a strange hostname for a monitoring, but it's all that was available):

http://wn3.epcc.ed.ac.uk/billing/xml/

GraphTool is written in python and uses matplotlib to generate the plots. Cherrypy is used for the web interface. The monitoring can't just be installed as an rpm: you need to have PostGreSQL 8.2 available; create a new view in the billing database; set up Apache mod_rewrite; ensure you have the correct compilers installed..., but these steps shouldn't be a problem for anyone.

I think you will agree that the monitoring presents some really useful views of what the dCache is actually doing. It's still a work in progress, but let me know when you want to set it up and I should be able to help.

It should be possible to do something similar for DPM in the coming weeks.

04 December 2007

dCache 1.8.0-X

A new patch to dCache 1.8.0 was released on Friday (1.8.0-6). In addition, there is now a 1.8.0 dcap client. All rpms can be found here:

http://www.dcache.org/downloads/1.8.0/index.shtml

Sites (apart from Lancaster!) should wait for all of the Tier-1s to upgrade first of all as there are still some bugs being worked out.

dCache admin scripts

Last week I finally got a chance to have another look at some of the dCache administration scripts that are in the sysadmin wiki [1]. There is a jython interface to the dCache admin interface, but I find it difficult to use. As an alternative, the guys at IN2P3 have written a python module that creates a dCache admin door object which you can then use in your own python scripts to talk to the dCache [2]. One thing that I did was use the rc cleaner script [3] to clean up all of the requests (there were 100's!) that were stuck in Suspended state. You can see how the load on the machine running postgres dropped after removing the entries. Highly recommended.

I also wrote a little script to get information from the LoginBroker in order to print out how many doors are active in the dCache. This is essential information for sites that have many doors (i.e. Manchester) but find the dCache 2288 webpage difficult to use. I'll put it in the SVN repository soon.

[1] http://www.sysadmin.hep.ac.uk/wiki/DCache
[2] http://www.sysadmin.hep.ac.uk/wiki/DCache_python_interface
[3] http://www.sysadmin.hep.ac.uk/wiki/DCache_rc_cleaner

01 October 2007

SRM2.2 deployment workshop - bulletin 1

This is the first bulletin for the "SRM2.2 Deployment Workshop" which will take place on Tuesday the 13th and Wednesday the 14th of November 2007. The workshop is being organised by the UK's National eScience Centre (NeSC), GridPP and WLCG. It will take place at NeSC, Edinburgh, Scotland.

The goal of the workshop is to bring WLCG site administrators with experience of operating dCache and DPM storage middleware together with the core developers and other Grid storage experts. This will present a forum for information exchange regarding the deployment and operation of
SRM2.2 services on the WLCG. By the end of the workshop, site administrators will be fully aware of the technology that must be deployed in order to provide a service that fully meets the needs of the LHC physics programme.

Particular attention will be paid to the large number of sites who contribute small amounts of computing and storage resource (Tier-2s), as compared with national laboratories (Tier-1s). Configuration of SRM2.2 spaces and storage information publishing will be the main topics of
interest during the workshop. In addition, there will be a dedicated session for the Tier-1 sites running dCache.

As SRM2.2 has been proposed as an Open Grid Forum (OGF) standard, it is likely that the workshop will be of wider interest than only WLCG. However, it should be noted that the intention of the meeting is not to continue discussions of the current specification or of any future
version. The meeting will focus on the deployment of a production system.
It is intended that the meeting will be by invitation only. Anyone who thinks that they should receive an invitation should contact Greig Cowan to register their interest.

A detailed agenda and general information about the event are currently being prepared. A second announcement will be made once these are in place.

28 September 2007

Filesystems turn readonly

Some of the RAID'ed filesystems at Edinburgh recently decided that they would become readonly. This caused the dCache pool processes that depended on them to fall over (Repository got lost). Some of the affected filesystems were completely full, but not all of them. An unmount/mount cycle seems to have fixed things. Anyone seen this sort of thing before?

22 September 2007

dCache SRM2.2

There was a dedicated dCache session during CHEP for discussion between site admins and the developers in order to discover the latest developments and get help in the server configuration, which is *difficult*. Link groups and space reservations were flying about all over the place. More documentation is required, but this seems to be difficult when things are changing so fast (new options magically appear, or don't appear, in the dCacheSetup file). A training workshop would also be useful...

dCache update

I was speaking to Tigran from dCache during CHEP and got some new information about dCache and Chimera.

First off, ACLs are coming, but these are not tied to Chimera. They are implementing NFS4 ACLs, which are then mapped to POSIX, which (according to Tigran) makes them more like NT ACLs. Need to look into this further.

Secondly, the dCache guys are really pushing the NFS v4.1 definition as they see it as the answer to their local data access problems. 4.1 clients are being implemented in both Linux and Solaris (no more need to dcap libraries!). According to Tigran, NFS4.1 uses transactional operations. The spec doesn't detail the methods and return codes exactly. Rather, it defines a set of operations that can be combined into a larger operation. This sounds quite poweful, but how will the extra complexity lead to client-server interoperation?

Finally, one thing which I had realised about Chimera is that it allows you to modify the filesystem without actually mounting it. There is an API which can be used.

23 August 2007

sgm and prod pool accounts

I've been a bit confused of late regarding what the best course of action is regarding how to deal with sgm and prod pool accounts on the SEs, in particular, dCache. As an example, Lancaster have run into the problem where a user with an atlassgm proxy has copied files into the dCache and has correspondingly been mapped to atlassgm:atlas (not atlassgm001 etc, just plain old sgm). Non-sgm users have then tried to remove these files from the dCache and have been denied since they are simple atlas001:atlas users. The default dCache file permissions do not allow group write access. This raises a few issues:

1. Why is atlassgm being used to write files into the dCache in the first place?

2. Why are non-sgm users trying to remove files that were placed into the dCache by a (presumably privileged) sgm user?

3. When will dCache have ACLs on the namespace to allow different groups of users access to a bunch of files?

The answer to the 3rd point is that ACLs will be available some time next year when we (finally) get the Chimera namespace replacement to PNFS. ACLs come as a plugin to Chimera.

The interim solution appears to be just to map all atlas users to atlas001:atlas, but this obviously doesn't help the security and traceability aspect that pool accounts are partially trying to solve. Since DPM supports namespace ACLs, we should be OK with supporting sgm and prod pool accounts. Of course, this requires that everyone has the appropriately configured ACLs, which isn't necessarily the case, as we've experienced before.

Comments welcome below.

17 July 2007

SRM2.2 storage workshop

There was a storage workshop held at CERN on the 2nd and 3rd of July. The focus of discussions was on the SRM2.2 developments and testing of the endpoints. The majority of the endpoints are being published in the PPS, the intention being that the experiments will be able to use them in a ~production environment and allow some real stress tests to be run against them. The experiments see SRM2.2 as being an essential service for them, so hopefully they have sufficient manpower to run the tests...

Getting the software installed on the machines isn't a problem, but getting it configured can be tricky. The main point that I tried to highlight on a number of occasions was the necessity for sites to have really good documentation from both the developers (how the SRM2.2 spaces can be configued) and the experiments (how the SRM2.2 spaces should be configured for their needs). I will make sure that I provide instructions for everyone to ensure that the deployment goes (relatively) smoothly. It shouldn't be too much of a problem for DPM sites, dCache sites will need to start playing around with link groups ;-)

From mid-October, sites should be thinking of having these SRM2.2 spaces configured. The plan is that by January 2008, everyone will have this functionality available, and SRM2.2 will become the default interface.

25 April 2007

dCache, DPM and SRM2.2

As most of you know, the LCG experiments are requiring that all storage be accessible via the SRM2.2 interface. The current version of dCache, v1.7.0, only provides the SRM1 interface (and an incomplete SRM2.2). Full SRM2.2 support will only be available in the v1.8.0 branch of dCache. Once v1.8.0 has been fully tested and moves into production, all GridPP dCache sites will have to upgrade.

As I understand the situation, no upgrade path between v1.7.0 and v1.8.0 is planned. Sites will first have to upgrade to v1.7.1 and then move onto v1.8.0. The plan is such that v1.7.1 will contain the same code as v1.8.0, minus the SRM2.2 stuff.

Obviously all dCache sites will want to ensure that there is a stable version of the system, particularly as all sites now have 10's of TBs of experiment data on disk. The SRM2.2 bits of v1.8.0 are currently being tested. Once v1.7.1 is released we can test out the upgrade path before giving the nod to sites. There will be some additional complexity when it comes to setting up the space reservation parts of SRM2.2 in your dCache. Documentation will be available when sites have to perform this additional configuration step. In fact, all of this configuration may go into YAIM.

The situation for DPM is slightly simpler. Full SRM2.2 support exists in v1.6.4 which is going through certification at the moment (1.6.3 is the current production version). Again, there will be some additional complexity in configuring the SRM2.2 spaces, but this will be documented.

Even once the SRM2.2 endpoints are available, it is likely that the SRM1 endpoint (running simultaneously on the same host) will continue to be used by the experiments until SRM2.2 becomes widely deployed and the client tools start using it as the default interface for file access.

16 April 2007

PostgreSQL housekeeping

The Edinburgh dCache recently started to show increased CPU usage (a few days after an upgrade to 1.7.0-34) as shown in the top plot. The culprit was a postgres process:

$ ps aux|grep 4419
postgres 4419 24.8 0.6 20632 12564 ? R Mar27 6091:11 postgres: pnfsserver atlas [local] PARSE

After performing a VACUUM ANALYSE on the atlas database (in fact, on the all of the databases), the CPU usage dropped back to normal, as can be seen in the bottom. I had thought auto-vacuuming was enabled by default in v8.1 of postgres, but I was mistaken. This has now been enabled by modifying the relevant entries in postgresql.conf.

stats_start_collector = on
stats_row_level = on
autovacuum = on

I also changed these parameters after the VACUUM process sent out a notice:

max_fsm_pages = 300000 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000 # min 100, ~70 bytes each

The server requires a restart after modifying the last two parameters.

Repository got lost

Recently the Edinburgh dCache has been failing. The usageInfo page was reporting

[99] Repository got lost

for all of the pools. This has been seen before, but only now do I understand why.

The dCache developers have added a process that runs in the background and periodically tries to touch a file on each of the pools. If this process fails, something is regarded as being wrong and the above message is generated. This could happen if there was a problem with the filesystem or a disk was slow to respond for some reason.

Edinburgh was being hit with this issue due to some of the disks pools being completely full, i.e., df was reporting 0 free space, while dCache still thought there was a small amount of space available. This mismatch seems to arise from the presence of the small control files on each dCache pool (these contain metadata information). Each file may take up an entire block on the disk without actually using up all of the space. I'm still trying to find out if dCache performs a stat() call on these files. It should also be noted that dCache has to read each of these control files at pool startup, so a full pool takes longer to come online than one that is empty.

There also appears to be a bug in this background process since all of the Edinburgh disk pools were reporting the error, even though some of them were empty. In the meantime, I have set the full pools to readonly and this appears to have prevented the problem reoccurring.