Tuesday, May 20, 2008

Environmental Issues in the datacenter

Well, my backup datacenter suffered a little setback yesterday.

Around 5pm, the primary AC tripped the in-line circuit breaker. A couple of hours later, the ambient temperature was right around 95F.

I got the nagios alert at 7:10, and it takes me between a half an hour and 45 minutes to get to the office. I got here at 7:45 and shut down everything that wasn't absolutely critical, got the backup AC running, and then realized what happened to the primary. After getting that fixed, I concentrated on the disk arrays (we've got an XRAID in the rack, and I'm working on setting up the AX-45, which is still on the table). The blade enclosure shut themselves down a half an hour before I got there.

I'm doing to be discussing filing an insurance claim with my boss to replace the drives in the array. I don't think I can trust them to go into the primary site now.

Any of you have this sort of problem? What do you do to help prevent it, or to recover from it?

[Update: I found this blog entry today, completely on accident. Irony, thy name is Everything Sysadmin]