The Windows registry is one of those delightful features1 that came out over a decade ago that still manages to screw people over to this day, resulting in tears, bitterness, and a vast collection of horror stories. In fact, I’m surprised no one has published a short story anthology about the Registry yet. I can picture it now: Scary IT Stories to tell in the Dark – Windows Registry Edition. Well, just in case there such a book ever sees the light of day, here’s my story:
After much hair pulling and frustration, somebody started to notice that the error always happened around the 30 second mark. Luckily, in a clutch miracle save, one of the people on our team had run into this exact same problem years ago and immediately recognized the issue. The ReceiveTimeout key under HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings was the culprit! From the Microsoft page: “Internet Explorer imposes a time-out limit for the server to return data”. Well, this registry key controls that time out value; once that time period has elapsed, IE aborts the connection, causing the error. Apparently, the people who all experienced the issue had that registry key defined, and it was set to only 30 seconds. We’re not sure how or why that key got added on some people’s machines and not on others, but that’s typical of a lot of keys in the registry. Case closed, problem solved!4 Now all that remains is to rant.
The Windows Registry is one of those design decisions where the cure is worse than the disease itself. The original problem it was trying to solve was the plethora of .INI files that stored application settings in Windows 3.x that were scattered about the filesystem. This is not entirely unlike the conf files in UNIX found under /etc, which are actually laid out in a fairly clean and straightforward way. Setting aside the obvious design flaw of the registry being a single point of failure where a rogue application or even a simple misclick can result in catastrophic failure, one of the other problems with the registry is that it is not transparent. I’m not talking about the file format of the registry either, although the fact that it is a non human readable binary format makes troubleshooting/maintenance a pain. The transparency I’m referring to is the fact that whenever a registry key comes into play, the end user has no way of knowing.
This is highlighted by the hours of wasted debugging that our team spent trying to track down this error. We simply did not know that the ReceiveTimeout key was what was causing the problem. IE merely displayed a generic error message with no troubleshooting information whatsoever. As a result, we couldn’t even google the error (a common troubleshooting tactic that can work wonders and yield a lot of insight into the issue).
Now I can understand hiding this complexity from an end user using a home PC who might get intimidated. However, why isn’t there more information available on Windows Server 2008? Shouldn’t there be some sort of utility that lets a user know when, where and how a registry key has come into play? This lack of transparency is exacerbated by the fact that the Registry is a monolithic file that stores not just settings used by the operating system itself, but also a whole myriad of settings used by the countless applications installed on the machine (on a home PC this problem is compounded by all the bloatware, malware, and spyware further polluting the registry with their dubious keys). Even if we had known the problem was somewhere in the registry, tracking down the problematic key among the hundreds of thousands of other keys would have been incredibly difficult. Talk about the proverbial needle in a haystack. It would have been much easier to narrow down our search in a system that used conf files. This is because there is a one to one mapping between a conf file and the application that uses it, allowing one to easily eliminate many, if not most conf files from consideration, making life a whole lot easier.
Here’s hoping someday Microsoft finally does the right thing and dumps the Registry. That would make a great advertisement for their new Windows OS! I can see it now: “Hey, I’m Bob. The other day I was doing some software development and I got screwed over by the Registry. So then I was thinking, why not get rid of it? Well next thing I know, Windows 8 came out, and there was no more registry! They got rid of it! I’m a PC, and Windows 8 was my idea! Hooray!”
1. I’m being sarcastic which is often times hard to do on the internet
2. The typical approach to tracking down a bug is to isolate the various variables, eliminating each one as a possible culprit. This helps narrow down the search, which helps when dealing with computer software. Not only does your application potentially span millions of lines, it also interacts with the OS (millions of lines), and other applications (also totally millions of lines), which all run on top of various hardware platforms. It would take forever to look at all the possible causes of a bug.
Unfortunately, this process of isolating variables gets trickier when the program does not fail in a deterministic fashion: The absence of buggy behavior under a given set of variables does not guarantee it will not appear later under those exact same conditions. Eliminating potential suspects becomes far more difficult. This is because you cannot prove an assertion simply by bringing up multiple examples where said assertion holds true. You can however, disprove the assertion by bringing up just one example to the contrary. To borrow Nassim Tabel’s iconic example: All it takes is one black swan to disprove the statement “All swans are white”.
3. Using the IETester tool, which lets you emulate the various versions of IE, copy this HTML into a test.html file in your IIS webroot directory:
If you then browse to this file via http://localhost/test.html try this in IE 6 and IE 7, you will get a “page cannot be displayed error”. The right way to do this would be to call the function after the page loaded. Something along these lines:
addEvent(window, ‘load’, appendme);
4. Well, not quite. The correct solution of course, is to make sure our page returns in less than 30 seconds, which is already an eternity in computer time. That, and perhaps streaming down bits and pieces of the response using AJAX.