More than 20 years later, we’re still learning new and fascinating details about the legendary Valve FPS, Half-Life 2. Today, we’re hearing a story form former Valve designer Tom Forsyth about a game-breaking bug discovered in 2013 that seemingly time-traveled and infected the original version of the game.
I understand that’s a lot to take in, so let’s unpack this genuinely entrancing case of technological happenstance. As extensively detailed on Forsyth’s Mastodon, Forsyth and Valve programmer Joe Ludwig were working on porting Half-Life 2 to the Oculus Rift VR headset, and Forsyth encountered a bug very early in the game that locked him out of a room and thus away from the critical path. There was no way to go forward, which was weird enough, but what made it weirder was the fact that no one could remember encountering the bug in the original game. Forsyth even watched videos of the opening scene where he was encountering the bug, and it wasn’t there.
Obviously, the most urgent concern at the time was just getting the bug fixed before the game was to be shipped on Oculus Rift, but adding to the frenzy was the fact that Valve seemed to be dealing with a time-traveling glitch. It didn’t exist before, and now it did. “How can this possibly be? At this point people are freaking out – this isn’t a normal bug – it appears to have traveled backwards in time and infected the original!” said Forsyth.
Eventually, the developers were able to identify the source of the bug: a guard who was in the now inaccessible room was standing too close to the door, and his toe was colliding with the door as it swung open and causing it to swing back into a closed state. Now that the issue had been discovered, the fix was “easy” even if it “took a lot of work to find because people had to dust off old memories of how the debugging tools worked, etc.”
Still, the mystery lingered. How in the name of Gabe Newell did this bug from 2013 manage to find its way in then-nine-year-old code? And furthermore, why was the soldier’s toe not preventing the door from opening in 2004? Or in any of the ensuing years until the bug was discovered?
“But why did this EVER work? The guard’s toe was in the way in the original version as well. As I say, we went back in time and compiled the original as-shipped source code – and the bug happened there as well. It’s always been there. Why didn’t the door slam closed again? How did this ever ship in the first place?”
Well, thankfully, there’s an answer to this riveting mystery: “good old floating point,” per Forsyth. I’m going to let an actual game designer do the talking for this part, but essentially, it’s not the game code that was the problem, it was the hardware that was instructing the precision of the game’s physics, and due to pure coincidence, that precision allowed the door to swing open on hardware it was originally built for, but not on the 2013 kit Valve was using to test the game.
“Half Life 2 was originally shipped in 2004, and although the SSE instruction set existed, it wasn’t yet ubiquitous, so most of HL2 was compiled to use the older 8087 or x87 maths instruction set,” said Forsyth. “That has a wacky grab-bag of precisions – some things are 32-bit, some are 64-bit, some are 80-bit, and exactly which precision you get in which bits of code is somewhat arcane.
“But ten years later in 2013, SSE had been standard in all x86 CPUs for a while – the OS depended on it being there, so you could rely on it. So of course by default the compilers use it – in fact you have to go out of your way to make them emit the old (slightly slower) x87 code. SSE uses a much more well-defined precision of either 32 or 64 bit according to what the code asks for – it’s much more predictable.”
Well, what that 32 or 64-bit precision was asking for, apparently, was a guard’s foot that wouldn’t give way to a door colliding with it. In the original x87 code, there was just the right amount of friction built into the guard’s boot to allow it to pivot exactly enough for the door to swing past it and open properly, but the newer SSE had “a whole bunch of tiny precisions” that were “very slightly different, and a combination of the friction on the floor and the mass of the objects means the guard still rotates from the collision, but now he rotates very slightly less far.
“So on the next frame of simulation, his toe is still in the way of the door,” said Forsyth. “The door isn’t allowed to just pass through his toe, so it does the only other option – it bounces back. I think by default it’s set to do so completely elastically, so the door bounces back with exactly the speed it came in at, slams shut, and locks again. And you’re stuck.”
That means, amazingly, the bug had existed in the game the entire time. The guard was always standing too close to the door, but because the compiler in the original build defaulted to an older floating-point precision, the game’s physics were ever-so-slightly different to what you’d see on a newer compiler, and that tiny bit of discrepancy meant the difference between a game-crucial door opening and not opening.
“And there you have it,” Forsyth concluded. “The two biggest bug-farms in gamedev – doors and floating point – contrived to make a simple NPC placement bug into quite the time-travelling palaver.”