Three bad things: threads, garbage collection, and nondeterministic destructors

Three bad things: threads, garbage collection, and nondeterministic destructors

These three programming environment "features" all have one characteristic in common that makes them bad: non-repeatability. If you run the same program more than once, and it uses any of those three things, then chances are it won't run identically every time. Of course, if your program is written correctly, it ought to produce the same effective results every time, but the steps to produce those results might be different, and the output itself might be different, if effectively identical.

For example, in a threaded map/reduce operation, the output of each parallelized map() will reach the reduce() phase at different times. Supposedly, the output of reduce() should be the same regardless of the ordering, but it doesn't mean reduce() is performing the same calculations.

Imagine you're running a map/reduce on an original Pentium processor with the infamous FDIV bug, and your reduce() includes a division operation. Depending on the order of inputs, you might or might not trigger the bug, and your result might be different, and you'd be left wondering why. That's the problem with non-repeatability. Even without the FDIV bug, maybe your code is buggy, or maybe you're just introducing rounding errors or int overflows; the ordering can change the result, and debugging it is hard.

A more common problem is well-known to anyone using threads; if you don't put your locks in the right places, then your program won't be "correct", even if it seems to act correctly 99.999% of the time. Sooner or later, one of those race conditions will strike, and you'll get the wrong answer. And heaven help you if you're the poor sap who has to debug it then, because non-reproducible bugs are the very worst kind of bugs.

Garbage collection and non-determinism

But everyone knows that locking is the main problem with threads. What about the others?

Non-reproducible results from garbage collection go hand-in-hand with non-deterministic destructors. Just having a garbage collector thread run at random times can cause your program's GC delays to move around a bit, but those delays aren't too important. (Except in real-time systems, where GC is usually a pretty awful idea. But most of us aren't writing those.)

But non-deterministic destructors are much worse. What's non-deterministic destruction? It's when the destructor (or finalizer) of an object is guaranteed to run at some point - but you don't know what point. Of course, the point when it runs is generally the point when the GC decides to collect the garbage.

And that's when the non-repeatability starts to really become a problem. A destructor can do anything - it can poke at other objects, add or remove things from lists or freelists, send messages on sockets, close database connections. Anything at all, happening at a random time.

Smart people will tell you that, of course, a destructor can do anything, but because you don't know when it'll run, you should do as little in the destructor as possible. In fact, most objects don't even need destructors if you have garbage collection! Those people are right - mostly. Except "as little as possible" is still too much, and as soon as you have anything at all in your destructor, it starts spreading like a cancer.

In the .net world, you can see this problem being hacked around every time you see a "using" statement. Because destructors in .net are non-deterministic, some kinds of objects need to be "disposed" by hand - back to manual memory management. The most common example seems to be database handles, because some rather lame kinds of databases slurp huge amounts of RAM per handle, and your web app will grind to a halt if it produces too many queries in too short a time without explicitly freeing them up.

But no problem, right? You can just get into the habit of using using() everywhere. Well, sort of. Unfortunately, objects tend to get included into other objects (either using inheritance or just by including member objects). What if one of those member objects should be dispose()d explicitly when your container object is destroyed? Well, the containing object now needs to implement its own dispose() that calls its member objects' dispose(). But not all of them; only the members that actually have a dispose(). Which breaks encapsulation, actually, because if someone adds a dispose() to one of those member objects later, you'll have to go through all your containing objects and get them to call it. And if you have a List

2010-08-10 »