Jump to content

Ye Olde Hard Disk Failure and Subsequent Downtime


Jyosua
 Share

Recommended Posts

I put a notice on the main site detailing this as well, but the events are worth detailing in full, here.

Yesterday, shortly after arriving home around 5PM PST, I got the brainy idea of rebooting the server. I was hoping it would take care of some of the lag we've been seeing when we get large influxes of new users. But after sending the reboot command, the server failed to restart: 40 minutes of nothing. Uh-oh. I started getting worried, so I contacted the data center and one of their guys started taking a look at it. At first, he thought it just needed to do a filesystem check and would be starting up afterwards, so he notified me of such. That sounded like a reasonable reason for the server to refuse to finish booting, so I bought it. An hour later, it turned out that the server was getting hung up on boot still, even after passing the mandatory filesystem check.

Well, as it turned out, the hard drive was failing. Thankfully (Sorta? XD), we've been down the road of horrible data loss before, and I had nightly backups. So, after verifying my backups were okay, the data center dude pulled the drive and gave us a fresh one (slightly bigger than before, too) with a clean OS install. Then came the backups. Unfortunately, the general support dude was much slower, and didn't finish restoring our backup files to the new disk until 3AM PST. As most of you know, I actually have a real engineering day job, so by this point I was already long asleep in bed. I could have probably done the file restore myself, much faster, but there were some connection issues that they had to fix that prevented me from doing that anyhow. Regardless, I saw the file restore was complete when I woke up this morning for work, but couldn't start truly fixing things until I got home tonight.

Now, you're probably wondering why it took me nearly 4 hours to finish fixing the server after I got home tonight. As it turns out, our nightly backups aren't just a straight-up disk mirror. I actually have to reconfigure all the programs we use and had to restore the database from the raw physical database files mysql saves. This was a pain, and much more complicated than a fresh install is. Thankfully, I figured it out and we should be back at 100% operating capacity. We might have lost about a day's worth of posts, since the last backup was the same morning as the server went down, but it shouldn't be much.

Finally, this has made me realize that I have no way of telling people what's going on when we have unanticipated outages like this. As a result, I will always post updates on things like this to my twitter account: @Jyosua from now on. If you want to know what's going on when stuff like this occurs, follow me there.

As a final note, I'm still looking into possibly upgrading the server in the future, but it's still a tiny bit out of our price range at the moment. If ad revenue continues to improve as more people visit SF and proves to be stable, we should be able to afford it in the near future.

Link to comment
Share on other sites

If ad revenue continues to improve as more people visit SF and proves to be stable, we should be able to afford it in the near future.

Cash in on that FE13 release/early release hype Jyo.

Thanks for the update and thanks for keeping this forum as good as you can. The lurker admin appears in the forum's time of need.

Link to comment
Share on other sites

Now, you're probably wondering why it took me nearly 4 hours to finish fixing the server after I got home tonight. As it turns out, our nightly backups aren't just a straight-up disk mirror. I actually have to reconfigure all the programs we use and had to restore the database from the raw physical database files mysql saves. This was a pain, and much more complicated than a fresh install is. Thankfully, I figured it out and we should be back at 100% operating capacity. We might have lost about a day's worth of posts, since the last backup was the same morning as the server went down, but it shouldn't be much.

You have saved us quite a few headaches by going through a few of your own. On behalf of the rp board ... THANK YOU.

Link to comment
Share on other sites

I was worried something was wrong with my account or my internet. Glad to see the whole thing's been resolved and everything's back up, tough.

Link to comment
Share on other sites

Jesus christ... That scared me.

And it lost Sharpy's draft group of all of our draft picks. Good thing I have everything stored inside me if it involves me directly...more or less...)

...BETTER GO CHECK THAT SPRITE CONTEST!

Edited by Ubel Engel
Link to comment
Share on other sites

I'm pretty impressed that you worked so hard to get that site back up and did in so fast. You could have relaxed a bit, and have it revive on the 1st February, the Birth of SF. It only goes to show just how dedicated you are in maintaining this site. Thank you very much, Jyosua. For a Root Admin that almost never shows up, it's very reassuring to see that you still care immensely for the site to go through all that in such little time. We're very lucky to have someone like you in charge.

Link to comment
Share on other sites

In the immortal words of Superbus, "[Josh's] HD shit the bed yesterday. And [...] it shit the bed HARD."

Thankfully, Josh is a pro at cleaning up shit and making things awesome again. :D

Link to comment
Share on other sites

Ah... I was thinking something like this had happened when I suddenly started getting 404'd trying to access the good ol' forest.

Thanks for getting it back up~

Link to comment
Share on other sites

So -that's- what happened. I was worried that something happened with the site due to the whole Canada thing.

But it's back up now, and my worry is dead.

Good bright-ish spot on my work day, too; raaaiiiin~...

Link to comment
Share on other sites

Thankfully (Sorta? XD), we've been down the road of horrible data loss before, and I had nightly backups.

lol :smug:

Peeps were kinda worried but it looks good now. Good thing there wasnt that data loss. I wont forget that witch hunt...hooo...

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...