30 March 2007

DPM srmPutDone Errors Understood

As usual the response of the DPM team to the report of srmPutDone errors was excellent.

There is a long term fix in the pipeline and a short term work around has been found: increase the maximum idle timeout in MySQL.

See the full posting over on the scotgrid blog for details.

29 March 2007

dCache 1.7.0-33 released

Most of you will have seen this already, but v1.7.0-33 of the dCache server available which should fix the data corruption problem that was announced on the user-forum list. Sites are recommended to install as soon as possible. Get in contact if there are any problems.

Storage accounting project page

I've created a project page for the storage accounting system. This will
allow Dave Kant and myself to better keep track of the open issues.

http://savannah.cern.ch/projects/storage-account/


Could all sites continue to check the published numbers and report any
inconsistencies to me.

As usual, the storage accounting page can be found here:

http://goc02.grid-support.ac.uk/storage-accounting/view.php?queryType=storage

Suggestions for improvements are welcome.

27 March 2007

DPM 1.6.4 Tagged

DPM version 1.6.4 has been tagged. The major change here is that support for secondary groups has been added. See the savannah patch for details.

Unfortunately there's another schema change in the offing. Details are in the twiki.

Some DPM srmPutDone errors found at Glasgow

See the ScotGrid Blog post for details.

14 March 2007

Re-enabling the new DPM GIP plugin

I spotted a minor issue with the DPM upgrade: rerunning YAIM will put the old DPM plugin, which doesn't do per-VO accounting properly, back in place.

You will have to re-enable the new plugin again by hand, after running YAIM, following the wiki instructions again.

I promise to try and get the new plugin into the next gLite release to avoid this faf...

13 March 2007

Some Extra DPM 1.6.3 Upgrade Notes

Downloading and diffing the YAIM versions 3.0.0-36 (which I ran) and 3.0.0-38 (the fixed version), reveals two things you have to do if you upgraded your DPM with v36:

  1. First, the srmv2.2 service was not chkconfiged to start again on boot, so run
    # chkconfig srmv2.2 on
  2. Secondly, the srmv2.2 service wasn't advertised in the information system so run
    # /opt/glite/yaim/scripts/run_function SITE-INFO.DEF config_gip
    # service globus-mds restart

Now you have a brand new shiny and advertised SRM v2.2 endpoint.

Of course, if you upgrade using v38 then you don't need to do anything.

12 March 2007

Glasgow Upgrade to DPM 1.6.3



I have upgraded Glasgow's DPM to 1.6.3 this morning. Executive summary is that everything went well, and we're now happily running the new DPM with a shiny SRM v2.2 daemon.

The key thing about this upgrade is the need to update the database schema for the new SRM v2.2 services. This update requires the DPM and DPNS daemons to be stopped. The easiest way to achieve this is to do the update through YAIM - this will stop the daemons, update the schema and then restart them. Have a look at the config_DPM_upgrade function (in YAIM 3.0.0-36) - the important upgrade is upgradeC.

As a piece of extra insurance at Glasgow I shutdown the DPM and took an extra database dump before I ran YAIM, just in case anything went wrong.

  1. Enter downtime in the GOC - your DPM will be down from 10-30 minutes, probably, depending on how big it is. This downtime only affects your SE though (probably safest to put in an hour, then come out early once things are working).
  2. Stop your DPM daemons. The right order is:

    service dpm-gsiftp stop
    service srmv2 stop
    service srmv1 stop
    service dpm stop
    service rfiod stop
    service dpnsdaemon stop

  3. Dump your database (instuctions on the wiki).
  4. Update your RPMs. If you use APT then YAIM's install_node works. At Glasgow we just do yum update.
  5. Rerun YAIM:

    /opt/glite/yaim/scripts/configure_node /opt/glite/yaim/etc/site-info.def SE_dpm_mysql | tee /tmp/upgrade

  6. Now, when the script gets to the database schema upgrade beware that this takes a considerable length of time. If you look at the ganglia plot above you'll see that on Glasgow's DPM, which is a fast machine, it took 12 minutes (we have about 100 000 files in our DPM).
  7. As advised, we got the harmless duplicate key warning:

    Configuring config_DPM_upgrade
    INFO Checking for database schema version...
    INFO Database version used: 2.2.0
    INFO Upgrading database schema from 2.2.0 to 3.0.0 !
    INFO: Stopping DPM services.
    [...]
    failed to query and/or update the DPNS/DPM databases : DBD::mysql::db do failed: Duplicate key name 'G_PFN_IDX' at UpdateDpmDatabase.pm line 264.
    Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
    Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
    Mon Mar 12 12:01:03 2007 : Starting to update the DPNS/DPM database.
    Please wait...
    INFO Schema version upgrade: Now, may you see the "Duplicate key name 'G_PFN_IDX' at UpdateDpmDatabase.pm" error message, it is harmless. The indexes are already created.
    INFO Starting DPM services

  8. However, check this area of the upgrade very carefully for errors - if there is a problem with the update script running from YAIM you'll have to investgate and fix it, possibly running the dpm_support_srmv2.2 script by hand.
  9. Check all is well, e.g., lcg-cr a file into your DPM and then lcg-cp it back out.
  10. If you're really nervous (or extra careful) you could login to MySQL and check your schema version is 3.0.0 (select * from schema_version; on dpm_db and cns_db).
  11. Come out of downtime and pat yourself on the back.


It would be possible to do this update by hand as well. A rough outline would be:

  1. Downtime as above.
  2. Stop your DPM as above.
  3. Run the update script (/opt/lcg/share/DPM/dpm-support-srmv2.2/dpm_support_srmv2.2 --db-vendor MySQL --db localhost --user $DPM_DB_USER --pwd-file $tmpfile --dpns-db cns_db --dpm-db dpm_db --verbose) by hand.
  4. Check the db schema is updated.
  5. Restart DPM.
  6. Test file transfers.
  7. Come out of downtime.

I haven't tried this so at the very least double check these instructions against how YAIM might do it.

Finally, DPM disk servers and WN/UI clients can be upgraded very simple. Do it though YAIM or just simply do a yum update and on the disk servers restart rfiod and dpm-gsiftp.