If you have followed the weeklies, you will have noticed we're discussing having another storage workshop. The previous one was thought extremely useful, and we want to create a forum for storage admins to come together and share their experiences with Real Data(tm).
Interestingly, we now have (or are close to getting!) more experience with tech previously not used by us. For example, does it improve performance having your DPM db on SSD? Is Hadoop a good option for making use of storage space on WNs?
We already have a rough agenda. There should be lots of sysadmin-friendly coffee-aided pow-wows. Maybe also some projectplanny stuff, like the implications for us of the end of EGEE, the NGI, GridPP4, and suchlike.
Tentatively, think Edinburgh in February.
24 November 2009
23 November 2009
100% uptime for DPM
(and anything else with a MySQL backend).
This weekend, with the ramp up of jobs through the Grid as a result of some minor events happening in Geneva, we were informed of a narrow period during which jobs failed accessing Glasgow's DPM.
There were no problems with the DPM, and it was working according to spec. However, the period was correlated with the 15 minutes or so that the MySQL backend takes to dump a copy of itself as backup, every night.
So, in the interests of improving uptime for DPMs to >99%, we enabled binary logging on the MySQL backend (and advise that other DPM sites do so as well, disk space permitting).
Binary logging (which is enabled by adding the string "log-bin" on it's own line to /etc/my.cnf, and restarting the service) enables (amongst other things, including "proper" uptothesecond backups) a MySQL-hosted InnoDB database to be dumped without interrupting service at all, thus removing any short period of dropped communication.
(Now any downtime is purely your fault, not MySQL's.)
This weekend, with the ramp up of jobs through the Grid as a result of some minor events happening in Geneva, we were informed of a narrow period during which jobs failed accessing Glasgow's DPM.
There were no problems with the DPM, and it was working according to spec. However, the period was correlated with the 15 minutes or so that the MySQL backend takes to dump a copy of itself as backup, every night.
So, in the interests of improving uptime for DPMs to >99%, we enabled binary logging on the MySQL backend (and advise that other DPM sites do so as well, disk space permitting).
Binary logging (which is enabled by adding the string "log-bin" on it's own line to /etc/my.cnf, and restarting the service) enables (amongst other things, including "proper" uptothesecond backups) a MySQL-hosted InnoDB database to be dumped without interrupting service at all, thus removing any short period of dropped communication.
(Now any downtime is purely your fault, not MySQL's.)
12 November 2009
Nearly there
The new CASTOR information provider is nearly back, the host is finally back up, but given that it's somewhat late in the day we better not switch the information system back till tomorrow. (We are currently running CIP 1.X, without nearline accounting.)
Meanwhile we will of course work on a resilienter infrastructure. We also did that before, it's just that the machine died before we could complete the resilientification.
We do apologise for the inconvenience caused by this incredibly exploding information provider host. I don't know exactly what happened to it, but given that it took a skilled admin nearly three days to get it back, it must have toasted itself fairly thoroughly.
While we're on the subject, a new release is under way for the other CASTOR sites - the current one has a few RAL-isms inside, to get it out before the deadline.
When this is done, work can start on GLUE 2.0. Hey ho.
Meanwhile we will of course work on a resilienter infrastructure. We also did that before, it's just that the machine died before we could complete the resilientification.
We do apologise for the inconvenience caused by this incredibly exploding information provider host. I don't know exactly what happened to it, but given that it took a skilled admin nearly three days to get it back, it must have toasted itself fairly thoroughly.
While we're on the subject, a new release is under way for the other CASTOR sites - the current one has a few RAL-isms inside, to get it out before the deadline.
When this is done, work can start on GLUE 2.0. Hey ho.
10 November 2009
Kaboom
Well it seems we lost the new CASTOR information provider (CIP) this morning and the BDII was reset to the old one - the physical host it lived on (the new one) decided to kick the bucket. One of the consequences is that nearline accounting is lost, all nearline numbers are now zero (obviously not 44444 or 99999, that would be silly...:-)).
Before you ask, the new CIP doesn't run on the old host because it was compiled for 64 bit on SLC5, and the old host is 32 bit SL4.
We're still working on getting it back, but are currently short of machines that can run it, even virtual ones. If you have any particular problems, do get in touch with the helpdesk and we'll see what we can do.
Before you ask, the new CIP doesn't run on the old host because it was compiled for 64 bit on SLC5, and the old host is 32 bit SL4.
We're still working on getting it back, but are currently short of machines that can run it, even virtual ones. If you have any particular problems, do get in touch with the helpdesk and we'll see what we can do.
30 September 2009
CIP update update
We are OK: problems in deployment that had not been caught in testing appear to be due to different versions of lcg-utils (used for all the tests) behaving subtly differently. So I could run tests as dteam prior to release and they'd work, but the very same tests would fail on the NGS CE after release, even if they'd also run as dteam. Those were finally fixed this morning.
29 September 2009
CIP deployment
As some of you may have noticed, the new CASTOR information provider (version 2.0.3) went live as of 13.00 or thereabouts today.
This one is smarter than the previous one: it automatically picks up certain relevant changes to CASTOR. It has nearline (tape) accounting as requested by CMS. It is more resilient against internal errors. It is easier to configure. It also has an experimental bugfix for the ILC bug (it works for me on dteam). It has improved compliance with WLCG Installed Capacity (up to a point, it is still not fully compliant.)
Apart from a few initial wobbles and adjustments which were fixed fairly quickly (but still needed to filter through the system), real VOs should be working.
ops was trickier, because they have access to everything in hairy ways, so we were coming up red on the SAM tests for a while. This appears to be sorted out for the SE tests, but still causes the CE tests to fail. Which is odd, because the failing CE tests consist of jobs that run the same data tests as the SE tests, which work. I talked to Stephen Burke who suggested a workaround which is now filtering through the information system.
We're leaving it at-risk till tomorrow - and the services are working. On the whole, apart from the ops tests with lcg-utils, I think it went rather well: the CIP is up against two extremely complex software infrastructures, CASTOR on one side, and the grid data management on the other, and the CIP itself has a complex task trying to manage all this information.
Any Qs, let me know.
This one is smarter than the previous one: it automatically picks up certain relevant changes to CASTOR. It has nearline (tape) accounting as requested by CMS. It is more resilient against internal errors. It is easier to configure. It also has an experimental bugfix for the ILC bug (it works for me on dteam). It has improved compliance with WLCG Installed Capacity (up to a point, it is still not fully compliant.)
Apart from a few initial wobbles and adjustments which were fixed fairly quickly (but still needed to filter through the system), real VOs should be working.
ops was trickier, because they have access to everything in hairy ways, so we were coming up red on the SAM tests for a while. This appears to be sorted out for the SE tests, but still causes the CE tests to fail. Which is odd, because the failing CE tests consist of jobs that run the same data tests as the SE tests, which work. I talked to Stephen Burke who suggested a workaround which is now filtering through the information system.
We're leaving it at-risk till tomorrow - and the services are working. On the whole, apart from the ops tests with lcg-utils, I think it went rather well: the CIP is up against two extremely complex software infrastructures, CASTOR on one side, and the grid data management on the other, and the CIP itself has a complex task trying to manage all this information.
Any Qs, let me know.
28 September 2009
Replicating like HOT cakes
As mentioned on the storage list, the newest versions of the GridPP DPM Tools (documented at http://www.gridpp.ac.uk/wiki/DPM-admin-tools) contain a tool to replicate files within a spacetoken (such as the ATLASHOTDISK).
At Edinburgh this is running in cron
DPNS_HOST=srm.glite.ecdf.ed.ac.uk
DPM_HOST=srm.glite.ecdf.ed.ac.uk
PYTHONPATH=/opt/lcg/lib64/python
0 1 * * * root /opt/lcg/bin/dpm-sql-spacetoken-replicate-hotfiles --st ATLASHOTDISK >> /var/log/dpmrephotfiles.log 2>&1
Some issues observed are :
* Takes quite a long time to run the first time. Because of all the dpm-replicate calls on the ~1000 files that ATLAS stuck in there it took around 4 hours just for 1 extra copy. Since then though only the odd file has come in - so it doesn't have much to do.
* The replicas are always on different filesystems - but not always different disk server. This obviously depends on how many servers you have for that pool (compared to the nreps you want), as well as how many filesystems on each server. The replica creation could be more directed but perhaps it should be the default behaviour of the built in command to use a different server if it can.
Intended future enhancements of this tool include:
* List in a clear way the physical duplicates in the ST.
* Remove excess duplicates.
* Automatic replications of a list of "hotfiles"
Other suggestions welcome.
At Edinburgh this is running in cron
DPNS_HOST=srm.glite.ecdf.ed.ac.uk
DPM_HOST=srm.glite.ecdf.ed.ac.uk
PYTHONPATH=/opt/lcg/lib64/python
0 1 * * * root /opt/lcg/bin/dpm-sql-spacetoken-replicate-hotfiles --st ATLASHOTDISK >> /var/log/dpmrephotfiles.log 2>&1
Some issues observed are :
* Takes quite a long time to run the first time. Because of all the dpm-replicate calls on the ~1000 files that ATLAS stuck in there it took around 4 hours just for 1 extra copy. Since then though only the odd file has come in - so it doesn't have much to do.
* The replicas are always on different filesystems - but not always different disk server. This obviously depends on how many servers you have for that pool (compared to the nreps you want), as well as how many filesystems on each server. The replica creation could be more directed but perhaps it should be the default behaviour of the built in command to use a different server if it can.
Intended future enhancements of this tool include:
* List in a clear way the physical duplicates in the ST.
* Remove excess duplicates.
* Automatic replications of a list of "hotfiles"
Other suggestions welcome.
20 August 2009
GridPP DPM toolkit v2.5.2 released
Hello all,
Another month, another toolkit release.
This one, relative to the last announced release (2.5.0) has a slightly improved functionality for dpm-sql-list-hotfiles and adds a -o (or --ordered) option to dpm-list-disk.
The -o option returns a sorted list of the files in the space selected, descending by filesize. As this uses the dpm API, the process currently needs to pull the entire filelist before sorting it, so, unlike the normal mode, you get all the files output in one go (after a pause of some minutes while all the records are acquired + sorted).
There's also a new release of the Gridpp-DPM-monitor package, which includes some bug fixes and the new user-level accounting plot functionality. This should work fine, but if anyone has any problems, contact me as normal.
All rpms at the usual place:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/
Another month, another toolkit release.
This one, relative to the last announced release (2.5.0) has a slightly improved functionality for dpm-sql-list-hotfiles and adds a -o (or --ordered) option to dpm-list-disk.
The -o option returns a sorted list of the files in the space selected, descending by filesize. As this uses the dpm API, the process currently needs to pull the entire filelist before sorting it, so, unlike the normal mode, you get all the files output in one go (after a pause of some minutes while all the records are acquired + sorted).
There's also a new release of the Gridpp-DPM-monitor package, which includes some bug fixes and the new user-level accounting plot functionality. This should work fine, but if anyone has any problems, contact me as normal.
All rpms at the usual place:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/
31 July 2009
GridPP DPM toolkit v2.5.0 released
Hello everyone,
I've just released version 2.5.0 of the GridPP DPM toolkit.
The main feature of this release is the addition of the tool
dpm-sql-list-hotfiles
which should be called as
dpm-sql-list-hotfiles --days N --num M
to return the top M "most popular" files over the past N days.
Caveat: the query involved in calculating the file temperature is a little more intensive than the average queries implemented in the tool kit. You may see a small load spike in your DPM when this executes, so don't run it a lot in a short space of time if your DPM is doing something important.
As always, downloads are possible from:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/
I've just released version 2.5.0 of the GridPP DPM toolkit.
The main feature of this release is the addition of the tool
dpm-sql-list-hotfiles
which should be called as
dpm-sql-list-hotfiles --days N --num M
to return the top M "most popular" files over the past N days.
Caveat: the query involved in calculating the file temperature is a little more intensive than the average queries implemented in the tool kit. You may see a small load spike in your DPM when this executes, so don't run it a lot in a short space of time if your DPM is doing something important.
As always, downloads are possible from:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/
15 July 2009
Storage workshop and STEP report
The storage workshop writeup is available! Apart from notes from the workshop, it also contains Brian's storage STEP report. Follow the link and read all about it. Comments are welcome.
03 July 2009
GridPP Storage workshop
The GridPP storage workshop was a success. Techies from the Tier 2s got together to show and tell, to discuss issues, and to hear about new stuff. We also had speakers from Tier 1 talking about IPMI and verifying disk arrays, and from the National Grid Service talking about procurement and SANs.
We talked about STEP (storage) experiences, and had a dense programme with a mix of content for both newbies and oldbies, and we gazed into the crystal ball to see what's coming next.
All presentations will be available on the web on the hepsysman website shortly (if they aren't already). There will also be a writeup of the workshop.
We talked about STEP (storage) experiences, and had a dense programme with a mix of content for both newbies and oldbies, and we gazed into the crystal ball to see what's coming next.
All presentations will be available on the web on the hepsysman website shortly (if they aren't already). There will also be a writeup of the workshop.
24 June 2009
GridPP DPM toolkit v2.3.9 released
Ladies and gentlemen,
I am proud to announce the release of the gridpp-dpm-tools package,
version 2.3.9.
It is available at the usual place:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.9-6.noarch.rpm
This version is an extra-special "GridPP Milestone" release, in that
it includes a mechanism for writing user-level storage consumption
data to a database, so that you can make nice graphs of them later.
(The graphmaking functionality exists as scripts now, and will be
released as a modification to the dpm monitoring rpm as soon as I make
the necessary changes + package it.)
You can enable user-level accounting on your DPM by following the
instructions just newly added to the GridPP wiki for the toolkit. (
http://www.gridpp.ac.uk/wiki/DPM-admin-tools#Installation )
Comments, bug reports, &c all (happily?) accepted. In particular, if
the user-level accounting doesn't work for you, I'd like to know about
it, since it's been happy here at Glasgow for the last couple of days.
I am proud to announce the release of the gridpp-dpm-tools package,
version 2.3.9.
It is available at the usual place:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.9-6.noarch.rpm
This version is an extra-special "GridPP Milestone" release, in that
it includes a mechanism for writing user-level storage consumption
data to a database, so that you can make nice graphs of them later.
(The graphmaking functionality exists as scripts now, and will be
released as a modification to the dpm monitoring rpm as soon as I make
the necessary changes + package it.)
You can enable user-level accounting on your DPM by following the
instructions just newly added to the GridPP wiki for the toolkit. (
http://www.gridpp.ac.uk/wiki/DPM-admin-tools#Installation )
Comments, bug reports, &c all (happily?) accepted. In particular, if
the user-level accounting doesn't work for you, I'd like to know about
it, since it's been happy here at Glasgow for the last couple of days.
23 June 2009
Workshop agenda now available
As the title indicates, the agenda for the GridPP storage workshop is now uploaded to the web site, along with the agenda for hepsysman.
15 June 2009
Storage workshop planning
With now less than 0.05 years to go before the storage workshop, the agenda and other planning is continuing apace.
We would like to ask sites to give site reports (currently 20 mins each, incl. Qs) about their (own) storage infrastructure: we'd like to hear about their storage setup (as opposed to computing and other irrelevant stuff :-P) as well as their STEP experiences. This is partly so we can discuss the implications, but also for the benefit of folks who will be attending the storage workshop only. We will get feedback from ATLAS on STEP, ie the users' perspective.
Brian will present our experiences with BeStMan and Hadoop; there will be an introduction to the storage schema, to the DPM toolkit, to SRM testing, to user level accounting, and from the Tier 1 a talk on disk arrays scheduling, and hopefully room for discussion. So lots of things to look forward to!
We would like to ask sites to give site reports (currently 20 mins each, incl. Qs) about their (own) storage infrastructure: we'd like to hear about their storage setup (as opposed to computing and other irrelevant stuff :-P) as well as their STEP experiences. This is partly so we can discuss the implications, but also for the benefit of folks who will be attending the storage workshop only. We will get feedback from ATLAS on STEP, ie the users' perspective.
Brian will present our experiences with BeStMan and Hadoop; there will be an introduction to the storage schema, to the DPM toolkit, to SRM testing, to user level accounting, and from the Tier 1 a talk on disk arrays scheduling, and hopefully room for discussion. So lots of things to look forward to!
08 June 2009
GridPP DPM toolkit v2.3.6 released
This is a bug-fix release for 2.3.5, which had some annoying whitespace inconsistencies introduced into the dpm-sql* functions. Thanks to Stephen Childs for noticing them.
(The direct link for download is:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.6-5.noarch.rpm
and the documentation is still at
https://www.gridpp.ac.uk/wiki/DPM-admin-tools
and has been slightly updated for this release.)
(The direct link for download is:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.6-5.noarch.rpm
and the documentation is still at
https://www.gridpp.ac.uk/wiki/DPM-admin-tools
and has been slightly updated for this release.)
02 June 2009
SRM protocol status
The Grid Storage Management working group (GSM-WG) in OGF exists to standardise the SRM protocol. Why standardise? We need this to ensure the process stays open, and can benefit other communities than WLCG.
The SRM document is GFD.129, and we now have an experiences document available for public comments. You are invited to read this document and submit your comments - thanks!
You can even do so anonymously!
The SRM document is GFD.129, and we now have an experiences document available for public comments. You are invited to read this document and submit your comments - thanks!
You can even do so anonymously!
01 June 2009
Summary of DESY workshop
Getting the SRM implementers back together was very useful, and long overdue. We agreed of course to not change anything :-)
This is the summary, for the full report attend the next GridPP storage meeting.
- We needed to review how WLCG clients make use of the protocol; there are cases where they do not make the most efficient use of the protocol, thus causing a high load on the server. Is the estimated wait time used properly?
- Differences between implementations may need documenting, e.g. whether an implementation supports "hard" pinning.
- We reviewed the implementations' support for areas of the protocol, whether it was fully or partially supported (or not at all), to find a "core" which MUST be universally supported, and whether the implementers thought the feature desirable, given their specialist knowledge of the underlying storage system.
- Security and the use of proxies were discussed.
This is the summary, for the full report attend the next GridPP storage meeting.
28 April 2009
GridPP DPM toolkit v2.3.5 released
The inaugural release of the DPM toolkit under my aegis has just happened. This release contains some bug fixes (I've attempted to improve the intelligence of the SQL-based tools when trying to acquire the right username/password from configuration tools), and is deliberately missing dpm-listspaces as this is now provided by DPM itself.
This is also my first try building RPMs in OSX, so can people tell me if this is horribly broken for them? :)
(The direct link for download is:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.5-5.noarch.rpm
and the documentation is still at
https://www.gridpp.ac.uk/wiki/DPM-admin-tools
and has been slightly updated for this release.)
This is also my first try building RPMs in OSX, so can people tell me if this is horribly broken for them? :)
(The direct link for download is:
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/gridpp-dpm-tools-2.3.5-5.noarch.rpm
and the documentation is still at
https://www.gridpp.ac.uk/wiki/DPM-admin-tools
and has been slightly updated for this release.)
27 April 2009
Storage calendar
It's that time of year and I'm writing reports again. It shows that Greig has left, the number of blog entries has dramatically since the end of March... yes, I am still trying to persuade the rest of the group to blog about the storage work they're doing. Just because it's quiet doesn't mean they're not beavering away.
At the last storage meeting we had a look at the coming storage meetings - not our own but the ones outside GridPP. There were storage talks at ISGC and CHEP, we looked at some of those. The next pre-GDB or GDB is supposed to be about storage although the agenda was a bit bare last I looked. There will be a workshop at DESY focusing on WLCG's usage of SRM, with the developers from both sides, so to speak. Preparations are ongoing for the next OGF - mainly documents that need writing, we still need an "experiences" document describing interoperation issues at the API level. There's a hepix coming up (agenda), in Sweden - usually we have an interest in the filesystem part as well as the site management. Then there is a storage meeting 2-3 July at RAL, following hepsysman on 0-1 July.
At the last storage meeting we had a look at the coming storage meetings - not our own but the ones outside GridPP. There were storage talks at ISGC and CHEP, we looked at some of those. The next pre-GDB or GDB is supposed to be about storage although the agenda was a bit bare last I looked. There will be a workshop at DESY focusing on WLCG's usage of SRM, with the developers from both sides, so to speak. Preparations are ongoing for the next OGF - mainly documents that need writing, we still need an "experiences" document describing interoperation issues at the API level. There's a hepix coming up (agenda), in Sweden - usually we have an interest in the filesystem part as well as the site management. Then there is a storage meeting 2-3 July at RAL, following hepsysman on 0-1 July.
26 March 2009
More on CHEP
Right I meant to write more about stuff that's going on here but the network is somewhat unreliable (authentication times out and reauthentication is not always possible). Anyway, I am making copious notes and will be making a full report at the next storage meeting - Wednesday 8 April.
If I shall summarise the workshop, from a high level data storage/mgmt perspective, I'd say it's about stability, scaling/performance, data access (specifically xrootd and http), long term support, catalogue synchronisation, interoperation, information systems, authorisation and ACLs, testing, configuration, complexity vs capabilities.
More details in the next meeting(s).
If I shall summarise the workshop, from a high level data storage/mgmt perspective, I'd say it's about stability, scaling/performance, data access (specifically xrootd and http), long term support, catalogue synchronisation, interoperation, information systems, authorisation and ACLs, testing, configuration, complexity vs capabilities.
More details in the next meeting(s).
Subscribe to:
Posts (Atom)