The Directory Server Took a Holiday

Joe Teibel

New member
Our directory server software resides on a remote server (the machine) somewhere in Texas (I believe).

Since releasing G4.5 we have had zero problems with the server software crashing or the server machine going down.

In the last two weekends, both have happened at the worst possible time for our users - on Saturday morning. Memorial Day weekend the machine itself simply went down i.e. the machine itself restarted inexplicably. We try to take weekends off around here so unfortunately, I was not aware of either scenario until Tues. following Memorial Day and today (after this weekends crash).

Why has this happened? The machine going down I put off as an isolated event - machines don't work 100% of the time indefinitely. Unfortunately, we have to deal with that reality. The server software crash this weekend should have been handled better. I have it setup to auto-restart the exe if it crashes (which it never has for the live servers) and I have seen the auto-restart script work i.e. forced a crash and watched it restart. However, this Sat. AM it didn't restart the server. I don't have an explanation for that but I can assure you I will be working today to a) find a better re-start solution and b) see if I can hook up a notification to my phone so if a software crash on the weekends happens, at least I should be able to get it up again sooner.

A small note of explanation: we have two versions of the server running for 4.5 - .036 and earlier uses one and .050 uses another. Only the .050 server crashed this weekend which is why earlier versions still worked.

I'm sorry some of you couldn't enjoy our multiplayer feature this weekend - I will be working on it this week to make sure that doesn't happen again.

Have fun flying guys.
 
Joe Teibel said:
I'm sorry some of you couldn't enjoy our multiplayer feature this weekend - I will be working on it this week to make sure that doesn't happen again.
And that is why Knife Edge rocks!

Thank you. :)
 
and b) see if I can hook up a notification to my phone so if a software crash on the weekends happens, at least I should be able to get it up again sooner.

Kewl, give us your number and we'll be sure to wake you up on Saturday. :D :D :D
 
I have put in place what I believe to be a more reliable monitoring system so that if the server software crashes it should restart in a timely manner. I also hooked that up to the server machines restart procedure so that if the machine unexpectedly reboots it should restart the server software.

And finally, I added some functionality in the crash-notification pipeline to send me a text message on my phone if the server does crash meaning that assuming I receive that notification I can hop on a computer and verify that the server software has restarted.

Like I said in my original post, neither the machine nor the software has ever had a problem until Memorial Day weekend so having both (machine and software) run into problems in a two week span is somewhat of a mystery. However, if another issue arises, I'm confident that we now have better systems in place to get the server up and running again in a much shorter time period (than was experienced these last two weekends).

Thanks for your patience.
 
Joel,

One additional thing you might consider doing, is to have another person setup as you now are, to get notified & fix it. Just in that rare situation where you're not able to.

Jim L.
 
"Joel,"

Or you could set up the server with a Twitter account, and strap a laptop to your forehead, to get notified and fix it.
 
Back
Top