Part of being a good sysadmin is being flexible. Quite often when something breaks you’re solving a new problem, and the developers, users, and management turn to you to find a solution. One day you’ll be sorting out IP misconfiguration, the other you’ll be debugging Python libraries, and another you’ll be working around yet another limitation of your web server. You wind up reading a lot of documentation and learning a little about everything.
We had a particularly puzzling problem on one of our servers the other day. Somehow, its clock had been set eight hours fast1. We were running NTP, but it wasn’t set to jump the clock on start2, so even though NTP was synced to an upstream server, the clock was still a long ways off.
Normally, this is pretty easy to fix — you set the clock back, and you’re done. For us, though, it was a serious problem because uploads are ordered by commit time, which is dependent on the server’s clock. The guy who wrote our nagios time check was summarily flogged, and Alan and I set out to find a solution. We came up with a few ideas:
- Wait eight hours for real time to catch up
- Rewrite all the logs and database entries to have correct times
- Find a way to slow down what our daemons thought was the correct time in order to gradually bring their time in sync with real time
#1 was right out — we weren’t going to stop uploads for that long. #2 was heinously difficult and fraught with peril, so we went with #3. Alan’s quick googling found libfaketime, one of those great utilities that shouldn’t exist, but does. Libfaketime works as a LD_PRELOAD hack to change the way certain time library calls work. It can create a time offset and/or speed up and slow down the passage of time relative to the system clock3.
After two hours pondering and hacking, we put our plan into action. We set our server’s time back to normal and modified our daemons’ time to pretend to be six hours in the future, running at half-speed. Twelve hours later, the times synced up, and we undid the hack. Problem solved, and I’ve added one more unmentionable hack to my toolbox. :)
1: I’m not entirely sure, but I think the reason for this is a recent motherboard swap. We set our system clocks to UTC, and the motherboard probably arrived with its clock set to Pacific time.
2: Debian/Ubuntu admins note: this option is turned off by default in the openntpd package.
3: Interestingly, Erlang provides similar behavior by default. If erl detects a jump in the system clock, it will slowly adjust its concept of current time to match.