23 November 2009

100% uptime for DPM

(and anything else with a MySQL backend).

This weekend, with the ramp up of jobs through the Grid as a result of some minor events happening in Geneva, we were informed of a narrow period during which jobs failed accessing Glasgow's DPM.

There were no problems with the DPM, and it was working according to spec. However, the period was correlated with the 15 minutes or so that the MySQL backend takes to dump a copy of itself as backup, every night.

So, in the interests of improving uptime for DPMs to >99%, we enabled binary logging on the MySQL backend (and advise that other DPM sites do so as well, disk space permitting).

Binary logging (which is enabled by adding the string "log-bin" on it's own line to /etc/my.cnf, and restarting the service) enables (amongst other things, including "proper" uptothesecond backups) a MySQL-hosted InnoDB database to be dumped without interrupting service at all, thus removing any short period of dropped communication.

(Now any downtime is purely your fault, not MySQL's.)

1 comment:

Govind said...

What would be exact syntax for it.. does need to define a path.