201903 - apenwarr

Programmer migration patterns

I made a little flow chart of mainstream programming languages and how programmers seem to move from one to another.

There's a more common kind of chart, which shows how the languages themselves evolved. I didn't want to show the point of view of language inventors, but rather language users, and see what came out. It looks similar, but not quite the same.

If you started out in language A, this shows which language(s) you most likely jumped to next. According to me. Which is not very scientific, but if you wanted science, you wouldn't be here, right? Perhaps this flow chart says more about me than it says about you.

Disclaimers: Yes, I forgot your favourite language. Yes, people can jump from any language to any other language. Yes, you can learn multiple languages and use the right one for the job. Yes, I have biases everywhere.

With that out of the way, I'll write the rest of this post with fewer disclaimers and state my opinions as if they were facts, for better readability.

I highlighted what are currently the most common "terminal nodes" - where people stop because they can't find anything better, in the dimensions they're looking for. The terminal nodes are: Rust, Java, Go, Python 3, Javascript, and (honourable mention because it's kinda just Javascript) node.js.

A few years ago, I also would have highlighted C as a terminal node. Maybe I still should, because there are plenty of major projects (eg. OS kernels) that still use it and which don't see themselves as having any realistic alternative. But the cracks are definitely showing. My favourite example is Fun with NULL pointers, in which the Linux kernel had a vulnerability caused by the compiler helpfully stripping out NULL pointer checks from a function because NULL values were "impossible." C is a mess, and the spec makes several critical mistakes that newer languages don't make. Maybe they'll fix the spec someday, though.

But let's go back a few steps. If we start at the top, you can see four main branches, corresponding to specializations where people seem to get into programming:

"Low level" programming, including asm and C.
"Business" or "learning" programming, starting with BASIC.
Numerical/scientific programming, such as Fortran, MATLAB, and R.
Scripting/glue programming, like shell (sh) and perl.

(We could maybe also talk about "database query languages" like SQL, except there's really only SQL, to my great dismay. Every attempt to replace it has failed. Database languages are stuck in the 1960s. They even still CAPITALIZE their KEYWORDS because (they THINK) that MAKES it EASIER to UNDERSTAND the CODE.)

(I also left out HTML and CSS. Sorry. They are real languages, but everybody has to learn them now, so there was nowhere to put the arrows. I also omitted the Lisp family, because it never really got popular, although a certain subgroup of people always wished it would. And I would have had to add a fifth category of programmer specialization, "configuring emacs.")

(And I skipped Haskell, because... well, I considered just depicting it as a box floating off to the side, with no arrows into or out of the box, but I figured that would be extraneous. It was a great meta-joke though, because Haskell precludes the concept of I/O unless you involve Monads.)

Anyway, let's go back to the 1990s, and pretend that the world was simple, and (1) low level programmers used C or asm or Turbo Pascal, (2) business programmers used VB, (3) Numerical programmers used Fortran or R or MATLAB, and (4) Glue programmers used sh or perl.

Back then, programming languages were kind of regimented like that. I hadn't really thought about it until I drew the chart. But you pretty obviously didn't write operating system kernels in perl, or glue in MATLAB, or giant matrix multiplications in visual basic.

How things have changed! Because you still don't, but now it's not obvious.

Language migration is mostly about style

Let's look at the section of the tree starting with asm (assembly language). Asm is an incredibly painful way to write programs, although to this day, it is still the best way to write certain things (the first few instructions after your computer boots, for example, or the entry code for an interrupt handler). Every compiled language compiles down to assembly, or machine language, eventually, one way or another, even if that happens somewhere inside the App Store or in a JIT running on your phone.

The first thing that happened, when we abstracted beyond asm, was a fork into two branches: the C-like branch and the Pascal-like branch. (Yes, Algol came before these, but let's skip it. Not many people would identify themselves as Algol programmers. It mostly influenced other languages.)

You can tell the Pascal-like branch because it has "begin...end". You can tell the C-like branch because it uses braces. C, of course, influenced the design of many languages in ways not shown in my chart. Because we're talking about programmers, not language designers.

Let's look at C first. Oddly enough, once people got started in C, they started using it for all kinds of stuff: it was one of the few languages where you could, whether or not it was a good idea, legitimately implement all four categories of programming problem. All of them were a bit painful (except low-level programming, which is what C is actually good at), but it was all possible, and it all ran at a decent speed.

But if you're a C programmer, where do you go next? It depends what you were using it for.

C++ was the obvious choice, but C++, despite its name and syntax, is philosophically not very C-like. Unless you're BeOS, you don't write operating system kernels in C++. The operating systems people stuck with C, at least until Rust arrived, which looks like it has some real potential.

But the business ("large programs") and numerical ("fast programs") people liked C++. Okay, for many, "liked" is not the right word, but they stuck with it, while there was nothing better.

For glue, many people jumped straight from C (or maybe C++) to python 2. I certainly did. Python 2, unlike the weirdness of perl, is a familiar, C-like language, with even simpler syntax. It's easy for a C programmer to understand how python C modules work (and to write a new python module). Calling a C function from python is cheaper than in other languages, such as Java, where you have to fight with a non-refcounting garbage collector. The python "os" module just gives you C system calls, the way C system calls work. You can get access to C's errno and install signal handlers. The only problem is python is, well, slow. But if you treat it as a glue language, you don't care about python's slowness; you write C modules or call C libraries or subprocesses when it's slow.

Separately, when Java came out, many C and C++ "business software" programmers were quick to jump to it. Java ran really slow (although unlike python, it was advertised as "theoretically fast"), but people happily paid the slowness price to get rid of C++'s long compile times, header file madness, portability inconveniences, and use-after-free errors.

I recall reading somewhere that the inventors of Go originally thought that Go would be a competitor for Java or C++, but that didn't really work out. Java is like that famous hotel, also probably from Menlo Park, where once you check in, you never check out. Meanwhile, people who still hadn't jumped from C++ to Java were not likely to jump to another language that a) also ran somewhat slower than C++ and b) also had garbage collection, a religious issue.

Where Go did become popular was with all those glue coders who had previously jumped to python 2. It turns out python's slowness was kind of a pain after all. And as computers get more and more insanely complicated, python glue programs tend to get big, and then the dynamic typing starts to bring more trouble than value, and pre-compiling your binaries starts to pay off. And python 2 uses plenty of memory, so Go gives a RAM improvement, not a detriment like when you move from C++. Go isn't much harder to write than python, but it runs faster and with (usually, somewhat) less RAM.

Nowadays we call Go a "systems" language because "glue" languages remind us too much of perl and ruby, but it's all the same job. (Try telling a kernel developer who uses C that Go is a "systems" language and see what they say.) It's glue. You glue together components to make a system.

The Hejlsberg factor

Let's look next at the Visual Basic and Pascal branches, because there's a weird alternate reality that you either find obviously right ("Why would I ever use something as painful as C or Java?") or obviously wrong ("Visual... Basic? Are you serious?")

Back in the 1980s and 1990s, some people still believed that programming should be approachable to new programmers, so personal computers arrived with a pre-installed programming language for free, almost always BASIC.

In contrast, when universities taught programming, they shunned BASIC ("It is practically impossible to teach good programming to students that have had a prior exposure to BASIC"), but also shunned C. They favoured Pascal, which was considered reasonably easy to learn, looked like all those historical Algol academic papers, and whose syntax could be used to teach a class about parsers without having to fail most of your students. So you had the academic branch and the personal computing branch, but what they had in common is that neither of them liked C.

BASIC on PCs (on DOS) eventually became Visual Basic on Windows, which until javascript came along was probably the most-used and most-loved programming language ever. (It is still the "macro" language used in Excel. There are a lot of Excel programmers, although most of them don't think they're programmers.)

Meanwhile, Pascal managed to migrate to PCs and get popular, mainly thanks to Turbo Pascal, which was probably the fastest compiler ever, by a large margin. They weren't kidding about the Turbo. They even got some C programmers to use it despite preferring C's syntax, just because it was so fast. (Turbo C was okay, but not nearly as Turbo. Faster than everyone else's C compiler, though.)

(Pascal in universities got more and more academic and later evolved into Modula and Ada. That branch would have probably died out if it weren't for the US military adopting Ada for high-reliability systems. Let's ignore Ada for today.)

At that point in history, we had two main branches of "business" developers: the BASIC branch and the Pascal branch. And now Windows was released, and Visual Basic. Turbo Pascal for DOS was looking a bit old. Turbo Pascal for Windows was not super compelling. In order to compete, the inventor of Turbo Pascal, Anders Hejlsberg, created Delphi, a visual environment like Visual Basic, but based on the Turbo Pascal language instead, and with fewer execrable always-missing-or-incompatible-dammit runtime DLLs.

It was really good, but it wasn't Microsoft, so business wise, things got tough. In an unexpected turn of events, eventually Hejlsberg ended up working at Microsoft, where he proceeded to invent the C# language, which launched the Microsoft .NET platform, which also had a Visual Basic .NET variant (which was terrible). This unified the two branches. Supposedly.

Unfortunately, as mentioned, VB.NET was terrible. It was almost nothing like Visual Basic; it was more like a slower version of C++, but with a skin of not-quite-Basic syntax on top, and a much worse UI design tool. C# also wasn't Delphi. But all those things were dead, and Microsoft pushed really hard to make sure they stayed that way. (Except Microsoft Office, which to this day still uses the original Visual Basic syntax, which they call "Visual Basic for Applications," or VBA. It might be more commonly used, and certainly more loved by its users, than all of .NET ever was.)

I actually don't know what became of Visual Basic programmers. Microsoft shoved them pretty hard to get them onto VB.NET, but most of them didn't go along. I wanted to draw the "where they really went" arrow in my diagram, but I honestly don't know. Perhaps they became web developers? Or maybe they write Excel macros.

I think it's interesting that nowadays, if you write software for Windows using Microsoft's preferred .NET-based platforms, you are probably using a language that was heavily influenced by Hejlsberg, whose languages were killed by Microsoft and Visual Basic before he killed them back.

Then he went on to write Typescript, but let's not get ahead of ourselves.

A brief history of glue languages

The original glue language was the Unix shell, famous because it introduced the concept of "pipelines" that interconnect small, simple tools to do something complicated.

Ah, those were the days.

Those days are dead and gone
and the eulogy was delivered by Perl.
-- Rob Pike

It turns out to be hard to design small, simple tools, and mostly we don't have enough time for that. So languages which let you skip the small simple tools and instead write a twisted, gluey mess have become much more popular. (It doesn't help that sh syntax is also very flawed, especially around quoting and wildcard expansion rules.)

First came awk, which was a C-syntax-looking parser language that you could use in a shell pipeline. It was a little weird (at the time) to use a mini-language (awk) inside another language (sh) all in one "line" of text, but we got over it, which is a good thing because that's how the web works all the time now. (Let's skip over csh, which was yet another incompatible C-syntax-looking language, with different fatal flaws, that could be used instead of sh.)

Perl came next, "inspired" by awk, because awk didn't have enough punctuation marks. (Okay, just kidding. Kind of.)

Perl made it all the way to perl 5 with ever-growing popularity, then completely dropped the ball when they decided to stop improving the syntax in order to throw it all away and start from scratch with perl 6. (Perl 6 is not shown in my diagram because nobody ever migrated to it.)

This left room for the job of "glue" to fracture in several directions. If you thought perl syntax was ugly, you probably switched to python. If you thought perl syntax was amazing and powerful and just needed some tweaks, you probably switched to ruby. If you were using perl to run web CGI scripts, well, maybe you kept doing that, or maybe you gave up and switched to this new PHP thing.

It didn't take long for ruby to also grow web server support (and then Ruby on Rails). Python evolved that way too.

It's kind of interesting what happened here: a whole generation of programmers abandoned the command line - the place where glue programs used to run - and wanted to do everything on the web instead. In some ways, it's better, because for example you can hyperlink from one glue program to the next. In other ways it's worse, because all these modern web programs are slow and unscriptable and take 500MB of RAM because you have to install yet another copy of Electron and... well, I guess that brings us to the web.

Web languages

You will probably be unsurprised to see that my chart has pretty much everything in the whole "glue" branch converging on javascript. Javascript was originally considered a frontend-only language, but when node.js appeared, that changed forever. Now you can learn just one language and write frontends and backends and command-line tools. Javascript was designed to be the ultimate glue language, somehow tying together HTML, CSS, object-orientation, functional programming, dynamic languages, JITs, and every other thing you could make it talk to through an HTTP request.

But it's ugly. The emphasis on backward compatibility, which has been essential to the success of the web, also prevents people from fixing its worst flaws. Javascript was famously thrown together in 10 days in 1995. It's really excellent for 10 days of work, but there were also some mistakes, and we can't fix them.

This brings us to the only bi-directional arrow in my chart: from javascript to python 3, and back again. Let's call it the yin-yang of scripting languages.

Most of the other historical glue+web languages are fading away, but not python. At least not yet. I think that's because... it's sane. If you program long enough in javascript, the insanity just starts to get to you after a while. Maybe you need a pressure release valve and you switch to python.

Meanwhile, if you program in python long enough, eventually you're going to need to write a web app, and then it's super annoying that your frontend code is in a completely different language than the backend, with completely different quirks, where in one of them you say ['a','b','c'].join(',') and in the other you say ','.join(['a','b','c']) and you can never quite remember which is which.

One of them has a JIT that makes it run fast once it's started, but one of them starts fast and runs slow.

One of them has a sane namespace system, and the other one... well. Doesn't.

I don't think python 3 can possibly beat javascript in the long run, but it's not obvious it'll lose, either.

Meanwhile, Hejlsberg, never quite satisfied with his alternate reality branch of programming, saw the many problems with javascript and introduced TypeScript. Also meanwhile, Microsoft has suddenly stopped being so pushy about native Windows apps and started endorsing the web and open source in a big way. This means that for the first time, Microsoft is shoving its own developers toward web languages, which means javascript. They have their TypeScript spin on it (which is a very nice language, in my opinion), but that branch, an alternate reality for decades now, is finally converging. It probably won't be long before it ends.

Will TypeScript actually win out over pure javascript? Interesting question. I don't know. It's pretty great. But I've bet on Hejlsberg languages before, and I always lose.

Epilogue: Python 2 vs Python 3

With all that said, now I can finally make a point about python 2 vs 3. They are very similar languages, yet somehow not the same. In my opinion, that's because they occupy totally different spots in this whole programmer migration chart.

Python 2 developers came from a world of C and perl, and wanted to write glue code. Web servers were an afterthought, added later. I mean, the web got popular after python 2 came out, so that's hardly a surprise. And a lot of python 2 developers end up switching to Go, because the kind of "systems glue" code they want to write is something Go is suited for.

Python 3 developers come from a different place. It turns out that python usage has grown a lot since python 3 started, but the new people are different from the old people. A surprisingly large fraction of the new people come from the scientific and numerical processing world, because of modules like SciPy and then Tensorflow. Python is honestly a pretty weird choice for high-throughput numerical processing, but whatever, those libraries exist, so that's where we go. Another triumph of python's easy integration with C modules, I guess. And python 3 is also made with the web in mind, of course.

To understand the difference in audience between python 2 and 3, you only need to look at the different string types. In python 2, strings were a series of bytes, because operating systems deal in bytes. Unix pipelines deal in bytes. Network sockets deal in bytes. It was a glue language for systems programs, and glue languages deal in bytes.

In python 3, strings are a series of unicode characters, because people kept screwing up the unicode conversions... when interacting with the web, where everything is unicode. People doing scientific numerical calculations don't care much about strings, and people doing web programming care a lot about unicode, so it uses unicode. Try to write systems programs in python 3, though, and you'll find yourself constantly screwing up the unicode conversions, even in simple things like filenames. What goes around, comes around.

2019-03-18 »