I love the smell of

in the morning
Everything here is my opinion. I do not speak for your employer.
August 2010
September 2010

2010-08-10 »

Three bad things: threads, garbage collection, and nondeterministic destructors

These three programming environment "features" all have one characteristic in common that makes them bad: non-repeatability. If you run the same program more than once, and it uses any of those three things, then chances are it won't run identically every time. Of course, if your program is written correctly, it ought to produce the same effective results every time, but the steps to produce those results might be different, and the output itself might be different, if effectively identical.

For example, in a threaded map/reduce operation, the output of each parallelized map() will reach the reduce() phase at different times. Supposedly, the output of reduce() should be the same regardless of the ordering, but it doesn't mean reduce() is performing the same calculations.

Imagine you're running a map/reduce on an original Pentium processor with the infamous FDIV bug, and your reduce() includes a division operation. Depending on the order of inputs, you might or might not trigger the bug, and your result might be different, and you'd be left wondering why. That's the problem with non-repeatability. Even without the FDIV bug, maybe your code is buggy, or maybe you're just introducing rounding errors or int overflows; the ordering can change the result, and debugging it is hard.

A more common problem is well-known to anyone using threads; if you don't put your locks in the right places, then your program won't be "correct", even if it seems to act correctly 99.999% of the time. Sooner or later, one of those race conditions will strike, and you'll get the wrong answer. And heaven help you if you're the poor sap who has to debug it then, because non-reproducible bugs are the very worst kind of bugs.

Garbage collection and non-determinism

But everyone knows that locking is the main problem with threads. What about the others?

Non-reproducible results from garbage collection go hand-in-hand with non-deterministic destructors. Just having a garbage collector thread run at random times can cause your program's GC delays to move around a bit, but those delays aren't too important. (Except in real-time systems, where GC is usually a pretty awful idea. But most of us aren't writing those.)

But non-deterministic destructors are much worse. What's non-deterministic destruction? It's when the destructor (or finalizer) of an object is guaranteed to run at some point - but you don't know what point. Of course, the point when it runs is generally the point when the GC decides to collect the garbage.

And that's when the non-repeatability starts to really become a problem. A destructor can do anything - it can poke at other objects, add or remove things from lists or freelists, send messages on sockets, close database connections. Anything at all, happening at a random time.

Smart people will tell you that, of course, a destructor can do anything, but because you don't know when it'll run, you should do as little in the destructor as possible. In fact, most objects don't even need destructors if you have garbage collection! Those people are right - mostly. Except "as little as possible" is still too much, and as soon as you have anything at all in your destructor, it starts spreading like a cancer.

In the .net world, you can see this problem being hacked around every time you see a "using" statement. Because destructors in .net are non-deterministic, some kinds of objects need to be "disposed" by hand - back to manual memory management. The most common example seems to be database handles, because some rather lame kinds of databases slurp huge amounts of RAM per handle, and your web app will grind to a halt if it produces too many queries in too short a time without explicitly freeing them up.

But no problem, right? You can just get into the habit of using using() everywhere. Well, sort of. Unfortunately, objects tend to get included into other objects (either using inheritance or just by including member objects). What if one of those member objects should be dispose()d explicitly when your container object is destroyed? Well, the containing object now needs to implement its own dispose() that calls its member objects' dispose(). But not all of them; only the members that actually have a dispose(). Which breaks encapsulation, actually, because if someone adds a dispose() to one of those member objects later, you'll have to go through all your containing objects and get them to call it. And if you have a List, the List won't know to call dispose() when you remove an object from the list. How could it?

So then some people decide that, as a policy, just to be safe, every kind of object in their app should start off with a dispose(), and you should always call it, just in case you need to actually use it for something later.

Ha ha! And now you're back to manually destroying all your objects - dispose() doesn't get called automatically by the garbage collector - just in case. Only it's worse, because writing dispose() involves more boilerplate re-entrancy junk than a destructor would have!

In some sense, this is still "better" than a completely GC-less language, because at least if you forget to call dispose(), the symptoms are... harder to see. Which means you can ignore them, right? It'll come up as, say, database queries randomly failing (because you've used all your handles), but only under high load, and only when a certain kind of operation (the kind the mis-implemented dispose()) is being used. But that's okay, you can just retry the query, and it'll probably work the next time, because the GC runs pretty frequently when you're under high load. Oh, and no valgrind-like tool can save you, because the objects are all being collected. Eventually. They're not technically leaks! Oh, thank goodness for that garbage collecor, technically saving me from leaks.

(In case you're keeping score: Java has neither dispose() nor using(), nor deterministic destructors. You just have to fake it all up yourself.)

The same weirdness can happen with any kind of non-memory object handle, not just databases, of course. GC proponents like to tell you how "almost everything you allocate is just memory" as if that allows you to ignore the problem 99% of the time. But that's useless. Every program talks to the outside world eventually, so you inevitably end up with a lot of objects referencing a lot of real-life handles, and you make mistakes. One server program I wrote in C# had a strange bug where it wouldn't close connections for a random amount of time after they had finished - less delay under high load, more time under low load. Why? Because the socket wasn't explicitly disposed when the last asynchronous object was done with it. I had a bunch of objects floating around holding a reference to the socket, and all of them were triggering based on other things happening in an event loop; there was no "primary owner," and so there was no way to use the using() syntax, or to implicitly know when to call dispose(). We just wanted to dispose() when the last runner object was finally destroyed.

A solution (well, sort of)

Which brings us to the semantics people actually want for their objects: refcounting and deterministic destructors.

Even in C#, the solution to my dispose() problem was to implement a refcount. For each object that took a reference to my socket, I incremented the refcount by one; when one of those objects was destroyed (or rather, dispose()d, or it again wouldn't be deterministic), it reduced the socket's refcount by one. And when the socket refcount went to zero, it dispose()d the socket immediately.

Of course, implementing refcounts in C# is much more painful than in C++ (where it's pretty common), because... there are no deterministic destructors. Ha ha! In C++, you can create a "refcounted smart pointer" object, which essentially has the behaviour of incrementing a refcount when it's created, and decrementing the refcount when it's destroyed, and deleting the pointed-to object when the refcount goes to zero. If you pass these smart pointers around on the stack or use them to store stuff in objects, smart pointers are created and destroyed automatically, and sooner or later your "real" objects are destroyed automatically as they should be - right away, as soon as the last user is done with them.

It's very elegant and a bit recursive; we use deterministic destructors to implement smart pointers so that we can have deterministic destruction. But without deterministic destructors, the smart pointer trick is out the window; your smart pointers would be dereferencing objects at random times when the GC cleans up the smart pointer! You're reduced to updating refcounts by hand, as if you were a lowly C programmer. Gross.

Now, let's say you're coding in perl, python, ruby, or (of all things!) Visual Basic. In those languages, all your objects have refcounts, and all the refcounts are updated automatically. If I write this in python:

    if x:
        open("filename", "w").write(getpid())

Then you can create a file, write to it, and close it, all in one line. The closing is implicit, of course; it happens right away as soon as there are no more references to the file. Because, obviously, nothing else would make sense.

One of the reasons I'm writing this article at all is that people are underestimating just how valuable this behaviour is. Even the python developers themselves have declared deterministic destruction to be an "implementation detail" of the original python implementation, which is not guaranteed to be carried into other python implementations or even maintained in future versions of plain python. Of course, they had to say that, since people have ported python to run on the Java and .net virtual machines, which... lack deterministic destructors. So the above code will already act weird on IronPython, for example.

Instead, the python people, in recent versions, have introduced the "with" statement, which is really just using() with slightly fancier semantics. And that's very sad. Down that path lies the insanity that is .net, with every single object eventually needing to be manually "disposable," just in case.

In python, sockets close, database queries get freed, and files get closed, all when you need them to. Plus, your program chews through less memory, because if you create temporary objects in a tight loop, they'll get freed right away instead of sometime later.

And now to get back to where we started. With refcounting, your objects are always destroyed in the same sequence, even if you add lines of code or initialize new objects in the middle.

When refcounts go wrong

So then why does anyone bother with non-deterministic garbage collection?

Well, first of all, refcounting runs extra code every time to assign a reference, which can theoretically slow down your program. "Real" GC only runs through all your pointers occasionally, which might be less work overall than checking every time you assign a reference. A GC might even be smart enough to aim for idle periods in your program - for example, finish an entire request, then GC the temporary objects while waiting for the next request to come in - which is theoretically very fast. The reality doesn't always match the theory, but it does sometimes and I don't have any benchmarks to show you, so let's leave it at that: a pure GC can run faster than a refcounted one. Sometimes. Maybe.

But much more importantly, we come back to those pesky threads. If you have threads, then your refcounts have to be synchronized between threads; there are lots of ways to do that, but all of them involved having your processor cores communicate with each other, just in case they're sharing the same memory, every time you assign an object reference. And if refcounting was maybe, sometimes, moderately slower than GC before, well, when you add in thread synchronization, it gets much slower. So slow that reasonable people don't even think about trying it.

That's why Java and .net don't use refcounting, of course; they were heavily designed around threads. Microsoft COM uses refcounting, even though it was designed with threads in mind, but you don't notice because COM is horrendously slow anyhow and refcounting is the least of your worries. And presumably, since the python developers are still thinking about the day when python will maybe work well with threads, that's why they don't want to promise they'll never switch away from refcounting.

(Python's refcounting isn't a bottleneck right now, even with threads, because python doesn't do thread synchronization around its individual refcounts. It just has a single global interpreter lock that prevents any other threads from being in the interpreter at all, which is much faster in the common (not heavily threaded) case. That's why removing the GIL is so famously hard - because every way they've tried to do it, it makes single-threaded programs slower.)

So what does all this mean?

Well, it means we're headed for trouble. Intel and other chip manufacturers are insistent that we need to update our programs - and thus our programming languages - to do more stuff with threads. Watch out, they say, because soon, there'll be no way to make your program fast unless you use multiple cores simultaneously. And that means threads, which means garbage collection instead of refcounting, which means non-deterministic destructors.

And that sucks! Because let's face it, the reason programmers are switching to languages like python and ruby is that speed doesn't matter as much as a programmer's sanity. Threads, refcounting, and non-deterministic destructors are bad for a programmer's sanity.

It's a step backwards. Intel is desperately selling us something we didn't actually need.

Last time Intel told us we had to switch programming paradigms and make our programs more complicated in order to move into the future, they were trying to sell us the Itanium. That foolishness make time for AMD to invent a sensible 64-bit processor that didn't require you to solve a bunch of previously-unsolved computer science problems just to make your programs faster. Intel is repeating the same trick here; they want us to make our programs harder to write, so that their processors don't have to be harder to design. I sympathize, but I'm not falling for it. Threads aren't the only way; certainly not threads as they've been done until now.

So far, the vast majority of the programs I write have been single-threaded, refcounted (or manually memory managed), and deterministic. I think I'll continue to do it that way, thanks.

And you'd better believe most other people will too.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com