I love the smell of

in the morning
Everything here is my opinion. I do not speak for your employer.
July 2010
September 2010

2010-08-08 »

Preparations for a Tradeshow

Me: We wouldn't be so presumptuous to assume we could just make a product without talking to actual users in the market first. What do you think we would we do, just show up and tell you what you want?

Co-worker: You can't actually say that with a straight face, can you?

2010-08-10 »

Three bad things: threads, garbage collection, and nondeterministic destructors

These three programming environment "features" all have one characteristic in common that makes them bad: non-repeatability. If you run the same program more than once, and it uses any of those three things, then chances are it won't run identically every time. Of course, if your program is written correctly, it ought to produce the same effective results every time, but the steps to produce those results might be different, and the output itself might be different, if effectively identical.

For example, in a threaded map/reduce operation, the output of each parallelized map() will reach the reduce() phase at different times. Supposedly, the output of reduce() should be the same regardless of the ordering, but it doesn't mean reduce() is performing the same calculations.

Imagine you're running a map/reduce on an original Pentium processor with the infamous FDIV bug, and your reduce() includes a division operation. Depending on the order of inputs, you might or might not trigger the bug, and your result might be different, and you'd be left wondering why. That's the problem with non-repeatability. Even without the FDIV bug, maybe your code is buggy, or maybe you're just introducing rounding errors or int overflows; the ordering can change the result, and debugging it is hard.

A more common problem is well-known to anyone using threads; if you don't put your locks in the right places, then your program won't be "correct", even if it seems to act correctly 99.999% of the time. Sooner or later, one of those race conditions will strike, and you'll get the wrong answer. And heaven help you if you're the poor sap who has to debug it then, because non-reproducible bugs are the very worst kind of bugs.

Garbage collection and non-determinism

But everyone knows that locking is the main problem with threads. What about the others?

Non-reproducible results from garbage collection go hand-in-hand with non-deterministic destructors. Just having a garbage collector thread run at random times can cause your program's GC delays to move around a bit, but those delays aren't too important. (Except in real-time systems, where GC is usually a pretty awful idea. But most of us aren't writing those.)

But non-deterministic destructors are much worse. What's non-deterministic destruction? It's when the destructor (or finalizer) of an object is guaranteed to run at some point - but you don't know what point. Of course, the point when it runs is generally the point when the GC decides to collect the garbage.

And that's when the non-repeatability starts to really become a problem. A destructor can do anything - it can poke at other objects, add or remove things from lists or freelists, send messages on sockets, close database connections. Anything at all, happening at a random time.

Smart people will tell you that, of course, a destructor can do anything, but because you don't know when it'll run, you should do as little in the destructor as possible. In fact, most objects don't even need destructors if you have garbage collection! Those people are right - mostly. Except "as little as possible" is still too much, and as soon as you have anything at all in your destructor, it starts spreading like a cancer.

In the .net world, you can see this problem being hacked around every time you see a "using" statement. Because destructors in .net are non-deterministic, some kinds of objects need to be "disposed" by hand - back to manual memory management. The most common example seems to be database handles, because some rather lame kinds of databases slurp huge amounts of RAM per handle, and your web app will grind to a halt if it produces too many queries in too short a time without explicitly freeing them up.

But no problem, right? You can just get into the habit of using using() everywhere. Well, sort of. Unfortunately, objects tend to get included into other objects (either using inheritance or just by including member objects). What if one of those member objects should be dispose()d explicitly when your container object is destroyed? Well, the containing object now needs to implement its own dispose() that calls its member objects' dispose(). But not all of them; only the members that actually have a dispose(). Which breaks encapsulation, actually, because if someone adds a dispose() to one of those member objects later, you'll have to go through all your containing objects and get them to call it. And if you have a List, the List won't know to call dispose() when you remove an object from the list. How could it?

So then some people decide that, as a policy, just to be safe, every kind of object in their app should start off with a dispose(), and you should always call it, just in case you need to actually use it for something later.

Ha ha! And now you're back to manually destroying all your objects - dispose() doesn't get called automatically by the garbage collector - just in case. Only it's worse, because writing dispose() involves more boilerplate re-entrancy junk than a destructor would have!

In some sense, this is still "better" than a completely GC-less language, because at least if you forget to call dispose(), the symptoms are... harder to see. Which means you can ignore them, right? It'll come up as, say, database queries randomly failing (because you've used all your handles), but only under high load, and only when a certain kind of operation (the kind the mis-implemented dispose()) is being used. But that's okay, you can just retry the query, and it'll probably work the next time, because the GC runs pretty frequently when you're under high load. Oh, and no valgrind-like tool can save you, because the objects are all being collected. Eventually. They're not technically leaks! Oh, thank goodness for that garbage collecor, technically saving me from leaks.

(In case you're keeping score: Java has neither dispose() nor using(), nor deterministic destructors. You just have to fake it all up yourself.)

The same weirdness can happen with any kind of non-memory object handle, not just databases, of course. GC proponents like to tell you how "almost everything you allocate is just memory" as if that allows you to ignore the problem 99% of the time. But that's useless. Every program talks to the outside world eventually, so you inevitably end up with a lot of objects referencing a lot of real-life handles, and you make mistakes. One server program I wrote in C# had a strange bug where it wouldn't close connections for a random amount of time after they had finished - less delay under high load, more time under low load. Why? Because the socket wasn't explicitly disposed when the last asynchronous object was done with it. I had a bunch of objects floating around holding a reference to the socket, and all of them were triggering based on other things happening in an event loop; there was no "primary owner," and so there was no way to use the using() syntax, or to implicitly know when to call dispose(). We just wanted to dispose() when the last runner object was finally destroyed.

A solution (well, sort of)

Which brings us to the semantics people actually want for their objects: refcounting and deterministic destructors.

Even in C#, the solution to my dispose() problem was to implement a refcount. For each object that took a reference to my socket, I incremented the refcount by one; when one of those objects was destroyed (or rather, dispose()d, or it again wouldn't be deterministic), it reduced the socket's refcount by one. And when the socket refcount went to zero, it dispose()d the socket immediately.

Of course, implementing refcounts in C# is much more painful than in C++ (where it's pretty common), because... there are no deterministic destructors. Ha ha! In C++, you can create a "refcounted smart pointer" object, which essentially has the behaviour of incrementing a refcount when it's created, and decrementing the refcount when it's destroyed, and deleting the pointed-to object when the refcount goes to zero. If you pass these smart pointers around on the stack or use them to store stuff in objects, smart pointers are created and destroyed automatically, and sooner or later your "real" objects are destroyed automatically as they should be - right away, as soon as the last user is done with them.

It's very elegant and a bit recursive; we use deterministic destructors to implement smart pointers so that we can have deterministic destruction. But without deterministic destructors, the smart pointer trick is out the window; your smart pointers would be dereferencing objects at random times when the GC cleans up the smart pointer! You're reduced to updating refcounts by hand, as if you were a lowly C programmer. Gross.

Now, let's say you're coding in perl, python, ruby, or (of all things!) Visual Basic. In those languages, all your objects have refcounts, and all the refcounts are updated automatically. If I write this in python:

    if x:
        open("filename", "w").write(getpid())

Then you can create a file, write to it, and close it, all in one line. The closing is implicit, of course; it happens right away as soon as there are no more references to the file. Because, obviously, nothing else would make sense.

One of the reasons I'm writing this article at all is that people are underestimating just how valuable this behaviour is. Even the python developers themselves have declared deterministic destruction to be an "implementation detail" of the original python implementation, which is not guaranteed to be carried into other python implementations or even maintained in future versions of plain python. Of course, they had to say that, since people have ported python to run on the Java and .net virtual machines, which... lack deterministic destructors. So the above code will already act weird on IronPython, for example.

Instead, the python people, in recent versions, have introduced the "with" statement, which is really just using() with slightly fancier semantics. And that's very sad. Down that path lies the insanity that is .net, with every single object eventually needing to be manually "disposable," just in case.

In python, sockets close, database queries get freed, and files get closed, all when you need them to. Plus, your program chews through less memory, because if you create temporary objects in a tight loop, they'll get freed right away instead of sometime later.

And now to get back to where we started. With refcounting, your objects are always destroyed in the same sequence, even if you add lines of code or initialize new objects in the middle.

When refcounts go wrong

So then why does anyone bother with non-deterministic garbage collection?

Well, first of all, refcounting runs extra code every time to assign a reference, which can theoretically slow down your program. "Real" GC only runs through all your pointers occasionally, which might be less work overall than checking every time you assign a reference. A GC might even be smart enough to aim for idle periods in your program - for example, finish an entire request, then GC the temporary objects while waiting for the next request to come in - which is theoretically very fast. The reality doesn't always match the theory, but it does sometimes and I don't have any benchmarks to show you, so let's leave it at that: a pure GC can run faster than a refcounted one. Sometimes. Maybe.

But much more importantly, we come back to those pesky threads. If you have threads, then your refcounts have to be synchronized between threads; there are lots of ways to do that, but all of them involved having your processor cores communicate with each other, just in case they're sharing the same memory, every time you assign an object reference. And if refcounting was maybe, sometimes, moderately slower than GC before, well, when you add in thread synchronization, it gets much slower. So slow that reasonable people don't even think about trying it.

That's why Java and .net don't use refcounting, of course; they were heavily designed around threads. Microsoft COM uses refcounting, even though it was designed with threads in mind, but you don't notice because COM is horrendously slow anyhow and refcounting is the least of your worries. And presumably, since the python developers are still thinking about the day when python will maybe work well with threads, that's why they don't want to promise they'll never switch away from refcounting.

(Python's refcounting isn't a bottleneck right now, even with threads, because python doesn't do thread synchronization around its individual refcounts. It just has a single global interpreter lock that prevents any other threads from being in the interpreter at all, which is much faster in the common (not heavily threaded) case. That's why removing the GIL is so famously hard - because every way they've tried to do it, it makes single-threaded programs slower.)

So what does all this mean?

Well, it means we're headed for trouble. Intel and other chip manufacturers are insistent that we need to update our programs - and thus our programming languages - to do more stuff with threads. Watch out, they say, because soon, there'll be no way to make your program fast unless you use multiple cores simultaneously. And that means threads, which means garbage collection instead of refcounting, which means non-deterministic destructors.

And that sucks! Because let's face it, the reason programmers are switching to languages like python and ruby is that speed doesn't matter as much as a programmer's sanity. Threads, refcounting, and non-deterministic destructors are bad for a programmer's sanity.

It's a step backwards. Intel is desperately selling us something we didn't actually need.

Last time Intel told us we had to switch programming paradigms and make our programs more complicated in order to move into the future, they were trying to sell us the Itanium. That foolishness make time for AMD to invent a sensible 64-bit processor that didn't require you to solve a bunch of previously-unsolved computer science problems just to make your programs faster. Intel is repeating the same trick here; they want us to make our programs harder to write, so that their processors don't have to be harder to design. I sympathize, but I'm not falling for it. Threads aren't the only way; certainly not threads as they've been done until now.

So far, the vast majority of the programs I write have been single-threaded, refcounted (or manually memory managed), and deterministic. I think I'll continue to do it that way, thanks.

And you'd better believe most other people will too.

2010-08-20 »

Why Northwestern Ontario is... in Ontario

    In another dispute, over which province should own what is now the northwest of Ontario, the [British] Judicial Committee sided with Ontario against Manitoba and Ottawa. The ruling still makes no sense. The people of the region still treat Winnipeg at their national capital. Why? Because it is their geographical capital. Toronto and the rest of Ontario belong to a distant, different world.

    -- John Ralston Saul, A Fair Country (p.162)

That was in the early 1900's, apparently.

I always wondered how the screwy shape of Ontario had come about; it figures. The actual Canadian federal government (Ottawa) thought it would make sense to lump us in with Manitoba, but somehow the British overseers thought otherwise.

If you're from Northwestern Ontario, here's a fun game you can play with your friends from Southern Ontario. First, take a paper map of Ontario. (I know, paper maps? What are those?) They're generally printed with Southern Ontario on one side and Northern Ontario on the other. In Southern Ontario, find two towns that are about an hour apart, and point them out on the map.

Now flip over the map and find two towns that look about the same distance apart, and ask your friend to estimate how far apart they are. See if they remember to check the map scale - most people don't realize that the Northern Ontario side is drawn much smaller, because the land is absolutely huge by comparison and has a much lower population density.

Now imagine you're working for the Ontario government - down in Toronto - and you still haven't realized this.

2010-08-26 »

The strange case of virtual machines on telephones

    "Look, theory says that a JIT can run as fast as, or maybe faster than, a statically compiled language. It might be slow right now, but it'll be much better when we get a real/better JIT. Plus, the new version is already a lot faster, and I'm looking forward to the next version, which they promise will have huge speed improvements."

    -- Every Java user since 1996

If you've been saying the above about your Android phone (or Blackberry), then you, too, have become part of the decade-and-a-half-long train wreck of computer science that is Java.

I'm often mystified at the rejection of reality displayed by the proponents of Java-like virtual machines. It seems a simple statement of fact: even after 14 years, Java is still much slower than native code, and you can see it clearly just by looking at any app for 10 seconds. And yet the excuses above keep coming. 14 years.

But then I think, I know how this delusion works. I've been guilty of it myself. At my first company, I pushed to have all our data interchange sent through an API that I designed - UniConf - which was unfortunately slower in almost all cases than not using it. The idea was that if only all our code could be 100% pure UniConf then we'd suddenly be able to realize tons of wonderful advantages.

But despite herculean efforts, the advantages never materialized. What materialized was a lot of slowness, a lot of excessive memory usage, and a lot of weird bugs that forced us to backtrack through seven layers of overly-generalized code to diagnose.

Luckily for me, lack of resources prevented my own madness from spreading too far. I'm much better now.1

But what would it be like if the madness had been successful? What if I had been responsible for a system that spread to millions of users worldwide, which in nearly every case made things visibly and obviously worse? What would that do to my psyche? I think it would be unbearable.

Which brings us to Java-like VMs on cell phones. I have a lot of sympathy here, because:

Java used to be a good idea. Really.

Java on cell phones has not always been obviously a bad idea. To see why, you have to understand a bit about how these systems evolved.

First of all, we have little visibility into the Java's original reason for being. We know what people said, but we don't know if they said that for marketing or retroactive justification. What we do know is that the original sales push behind Java was applets for your web browser. Rich, client-side web applications.

Client-side web applications have exactly one super difficult critical requirement: security. You're downloading random apps from the Internet automatically and you want to run them automatically, and some of these apps will definitely be written by evil people and try to screw you, so you need a defense mechanism. Moreover, most people doing this will be doing it on Windows, which at the time meant Windows 95, which had no actual security whatsoever. Any native code could do anything it wanted. This situation persisted, mostly, up to and including Windows XP. (NT-based kernels have security, but the average person just ran everything as an administrator, negating literally all of it.)

So the typical user's operating system provided no strict memory protection or any other security features. This is where Java made perfect sense: if you can provably enforce security at the application layer, you can make a virtual machine that actually includes these missing security features, thus making it safe to run random applications on the Internet, and propelling us into the Internet Age. Sweet.

Java happened to fail at that, mostly due to slowness and crappiness and licensing, but the idea was sound, and it was a valiant and worthwhile effort that deserves our respect even if it didn't work out. Flash and Javascript won out in the end because they were somewhat better in some ways, but they both use VMs (whether interpreted or JITed), and rightly so.2

Unfortunately, nowadays the vast majority of Java apps never use any of Java's security features; they run as apps with full user rights, either on the client or on the server. So that advantage of the VM is gone... and the Java VM has no other advantages.3 But people, having been fooled once, kept going on the path they were already on.

Now ironically, the real problem was not natively compiled languages, but Windows (or to be generous to Microsoft, "the operating systems at the time"). Anybody who has studied computer science knows that modern processors capable of virtual memory were designed around the idea of keeping untrusted apps under control. Once upon a time, people used to actually share time on Unix machines. Lots of people on a few machines. And they were largely prevented from stomping on each other. The exceptions were security holes - fixable mistakes - and VMs have those too.

It is really not that hard to lock an application into a protected environment when your processor includes security features. Just google for chroot, BSD jail, AppArmor, SELinux. Yes, some of them are a little complex, but security is complex; nobody ever claimed Java's security architecture was simple.

Of course, if I had said that five years ago, you might not have believed me; you might have said those systems weren't secure enough, and that Java was somehow more secure in ways you couldn't quantify, but that application-level VM security is just somehow better somehow, I mean look at the virus situation on Windows. And I wouldn't be able to argue with you, because that's not even a logical argument, but it sounds vaguely convincing. And so the world went.

Then Apple came along and made the iPhone and its App Store and all the apps are native and the thing is still secure and apps can't stomp all over the system. Again, modulo security holes - fixable mistakes - which VMs don't eliminate. Here everybody was, going along with the above illogical argument in favour of VM security because they couldn't argue with it, and Apple just ignored them and showed it was all wrong. You can make native code secure. Of course you can. People did it in the 1980's. What on earth were we thinking?

But I'm getting ahead of the story a bit. Now I've told you why Android's use of a Java-like VM was demonstrably wrong (Apple demonstrated it) from the beginning, but first I wanted to tell you why Blackberries use Java, and lots of old cell phones used Java, and that wasn't obviously wrong.

The reason, of course, is that when Java was first applied to mobile phones, mobile phones didn't have processors capable of protected memory. Those processors were really low powered; security was impossible. Before Java, you could write custom native apps for a Blackberry... as long as you gave your source code to RIM to have them review it. Because native code could do anything, and there was physically no way to stop it once it got onto the device. Other phone manufacturers didn't even bother.

At the time, the first inexpensive embedded processors supporting protected memory were years in the future. If you could have a way to safely load third-party apps onto your phone... well, wow. You'd rule the world. You wouldn't just have a phone, you'd have a platform. This was not silliness, not at all. A Java VM was the first serious possibility of making a mobile phone into a serious, flexible, reconfigurable application platform.

It didn't work out very well, mostly because of Java's slowness and crappiness and licensing and (in the case of Java ME) horrendous lack of standardization. But GMail and Google Maps worked on my Blackberry, and millions of enterprise Blackberries are deployed running thousands of custom legacy enterprise apps you've never heard of that will make transitioning big established companies from Blackberry to iPhone virtually impossible for many years. In this case, pure thickheaded brute force did manage to win the day.

So okay, for the same reason that Java VMs started out as a good idea on Windows - namely, the platform itself lacked any security features - Java VMs made sense on phones. At first.

But embedded processors don't have those limitations anymore. They're serious processors now, with protected memory and everything. Most importantly, these processors were available and being used from the first day the first Google Phone was released. You no longer need a VM for security... but that means the VM doesn't provide any advantage at all.3

The fact that an Android phone has tolerable performance is, again, a triumph of pure thickheaded brute force. If you throw enough geniuses at a difficult technical problem, you might eventually solve that problem, even if the problem was stupid, and in this case, they mostly did.

But every step of the way, they're going to have this giant anchor of UniConf Dalvik tied around their neck, and Apple won't, and Apple's native apps will always run faster. It's going to be frustrating.

Maybe the speed won't matter. Maybe computers will get so fast that you just won't care anymore.

Java users have been saying that, too, since 1996.

Footnotes

1 I hope

2 Writing native desktop or server applications (ie. ones without crazy strict security requirements) using a Flash ("Adobe Air") or Javascript VM is kind of dumb for the same reasons set out in this article. There is one redeeming attribute of those systems, however: they already exist. If you have to have a VM for security on the web, then it makes sense to copy the runtime verbatim to the desktop/server, just because it's easier. Removing the VM would be possible and very nice, but it's just an optimization. Keeping the VM is easier, not harder, and thus is justifiable. (This doesn't really apply to Java since it never actually got popular for web apps.)

3 To pre-emptively refute a few common claims: "Write once run anywhere" doesn't actually work because the compiler was never the main problem; differences in OS semantics is the main problem, and you have to solve those equally for your apps in any language, even Java. Garbage collection can be and is frequently done in natively compiled languages. Introspection can be done in natively compiled languages. Digital signing of shared libraries can be implemented by any native shared library loader. Cross-language integration can be and is done all the time in native languages; in fact, VMs make this much harder, not easier, since now you have to rewrite all your languages. Sensible threading primitives (which some would say Java lacks anyway) can be implemented in any sensible language, natively compiled or not. Profile-driven optimization can be done in compiled languages. Support for multiple hardware architectures is just a recompile away - just ask any Mac developer. Provable memory protection (including prevention of all attempted null pointer dereferences) is doable and has been done in statically compiled languages. And before anyone asks, no, C/C++ does not do all these things; you need a good language. My point is that the good language needn't run in a VM; the VM is a red herring, a distraction.

July 2010

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com