We had an interesting failure recently. The server for one of our larger clients, who have their own high-spec dedicated server arranged, went down. We got the alert by text, and swung into action.
Well, more accurately we had called the hosting providers and told them to fix it.
They gave no solid reason or explanation as to why the server had died, but within in an hour things were normal enough.
Or so we thought…
In fact, things in the background weren’t looking so great. The hosting provision is rather off-hand. Off the “here’s a box, now off you go lads!” variety. It came with just an OS and an SSH connection. Nothing, and I mean nothing was installed. No fancy control panels here. No web server even. Or ftp. You were expected to do it yourself. Right down to compiling code.
So really, when they brought the database back up I should have known better. They ran no checks. They simply loaded the service and closed the ticket.
Unfortunately a few days later we started getting support calls from the users:
We can’t see our categories or tags.
Publishing in advance never works.
And then more alarmingly:
One of the blogs has disappeared!
The latter was a worry. It was one of the most important blogs on this MU install. It needed to be restored.
What I found was alarming – any attempt to look at wp_x_options for that blog in the admin panel caused an error. PHP was throwing up errors everywhere. But when I queried the table it looked fine.
In the end it was connecting with mySQL Administrator, a wonderful tool, and trying to do a backup before starting anything technical that revealed the true extent of the troubles. And there it was – it couldn’t backup one table because it was completely corrupt. Do a repair on it and ‘Hey presto!’ fixed.
But that wasn’t the end of it. The other problems still existed. I thought that this rebuild would have been fine, but no. The Administrator app didn’t find anything obvious, and when I looked in the taxonomy tables to see what was wrong all seemed fine – if you did a select on just one table.
But if you ran a join, things fell apart and you received zero rows. This time I decided to run an extended check on the whole database and found that of the 116 tables, five had indexes with link problems. Now, if you don’t know, an index in a database is essentially a linked list – this is a fast way of allowing data to be added without having to shuffle things around too much. Break a link and that index falls apart – especially on joins that need to find everything on a table.
So a repair against all affected tables and we were done.
If you get this kind of weird behaviour, especially after a crash, it can be well worth looking through the tables for problems like this. Do make sure you know what you’re doing, however – mySQL Administrator is a powerful, quick and dangerous tool!