201007 - apenwarr

Everything here is my opinion. I do not speak for your employer.

← May 2010

August 2010 →

2010-07-18 »

You can't make C++ not ugly, but you can't not try

...everything that's wrong with C++ comes down to that.

Background: I've been programming in C++ since about 1993; that's 17 years now. As late as 2009, I chose C++ to write our Windows client for EQL Data. If I were to make that decision today, I would still choose C++, because, quite simply, nothing else would work. (Okay, C would work, but it would be at least 5x as much effort. So no thanks. And for making plugins to legacy Windows apps, there's just nothing else out there.)

So, okay, I know a fair bit about C++. I've managed 30-person development teams building huge stuff in C++. Successfully. I have some context here.

And my context is: even if there's nothing better for the job, the truth is that C++ is incredibly ugly and misdesigned. C++ is a trap: they tell you that you can do anything you want in C++. Anything! C++ isn't a language, they say, it's a language construction kit! Build the language of your dreams in C++! And it'll be portable and scalable and fast and standardized!

And this is so close to true that even after using it for 17 years, I still almost believe it. I used to actually believe it. But see, some recent experience with the "amazing innovations" in other programming languages have convinced me otherwise.

First of all, if you haven't done much C++, you need to realize: most of the stuff in there is utter putrid boneheaded crap. This includes the RTTI and exceptions stuff; C++'s versions of those were enough to convince a whole generation of programmers that introspection and exceptions were outright evil and should be avoided. But as it turns out, that's only true in C++.

If you've heard anything about C++, you've probably heard that there's no standard string class and everybody rolls their own. That's not actually one of the bad things, in my opinion. As a person who's done a lot of coding in C++, I've actually come to understand that there really are good reasons to use different string objects at different times. In python, my current language of choice whenever it's appropriate, there's only one string class, and it's mostly okay, but every now and then you really want to just replace one character in the string with one other character (an O(1) operation) but you can't, so you instead construct a new string (an O(n) operation) and your program is vastly slower and less scalable. (Happily, python makes it pretty easy to create string-like objects, particularly using C extensions, so if you really need it, you can make it still go fast. But effectively thats just exactly creating your own string class, like so many people do in C++. See? Not always evil.)

The problem isn't interchangeable string classes. The problem is that the default C++ string class is so utterly godawfully stupid. No garbage collection? Check. No refcounting? Check. Need to allocate/free heap space just to pass a string constant to a function? Check. No support for null strings? Check. Horrendous mess of templates that makes tracing in a debugger utterly painful? Check. Horrendous mess of templates that makes non-ultramodern compilers unable to optimize them so that, for years, your toy homemade string class was 5x faster? Check. Totally unclear what character type it uses (actually you can use whatever you want at different times)? Check. Totally missing a sprintf-like formatter so you have to use something, anything, oh god please save me from iostreams just to produce a dynamic string? Check. Can't append to a string without allocating a whole new one? Check. Using the "+" append operator produces more temporary objects than you can count? Check. Using the "+" operator with two string constants gives a weird compiler error about adding pointers? Check.

In contrast, let's take, say, python's strings: refcounted, passed by reference, nullable, compatible with string constants, no templates, trivially easy debugging, always the same character type (although they changed it in python 3, sigh), include a sprintf-like operator, the + append operator works fine and multiple appended constants can be optimized at compile time (python's interpreter compiles to a metalanguage and can do basic optimizations like this). They even have an optimized non-constant append operator in newer versions of python that's more efficient than making a whole new copy every time.

How many of these string features required us to use an interpreted language? Precisely zero. An imaginary, fictional version of C++ could have had a string class with all these features and been just as fast and efficient. And I bet a lot fewer people would have written their own if that had been the case. There's actually no excuse for the crap that is C++ std::strings; they aren't better. They're just, somehow, the standard.

Another C++ problem that's close to my heart is function pointers. Not even lambdas or anonymous functions - let's not get all fancy, here. Just plain pointers to existing named functions. C++, being a superset of C, has function pointers, of course. And while the syntax for them has always been a little funny, they actually work fine and don't make you want to kill people too often. (Everywhere C function pointers are used they should always have a void *userdata parameter, and when people don't do that (like in qsort()), then you do want to kill things... but that's not C's fault, and sensible programmers can avoid that mistake.) So ok. C++ has function pointers.

But here's the thing: they utterly failed to extend this concept to include pointers to methods of an object.

Okay, that's not really true. In fact, it's a little-known fact that C++ - the language, not the insane libraries or templates - has built-in support for function pointers that call member functions.

The bad news is, this feature is so horrendously ill-conceived that absolutely nobody uses it for anything. Seriously. Nobody. I tried my best. The feature really is actually useless. The article I linked to tried desperately to make them look like maybe they have a purpose, but no. They just don't. (You can see the main problem in the linked article under the section "Member Function Pointers Are Not Just Simple Addresses." You might think, oh, of course not. They're a "this" pointer plus an address, right? Ha ha! Ha ha ha ha!! No they're not! They don't have a this pointer! You still have to provide your own this pointer when you call it! But it does store all kinds of crazy other stuff instead so it can do call-time vtable lookups on multiply-inherited objects! Ha ha!)

Utterly useless. But the bad thing isn't so much that it's useless - although maybe someone should have noticed that and killed the feature before it somehow passed the standards committee. The bad thing is that there is an obvious way to do it that wouldn't have been useless: just make a member function pointer be a struct { obj, funcaddress }. Everybody knows that calling a member function obj.f(x,y,z) in C++ is actually done by calling f(obj,x,y,z). There would be nothing to it. Since you know 'this' at the time you create the function pointer, you can resolve the funcaddress from the name 'f' at that point - the same way you would when making any method call, including vtables, multiple inheritance, and everything - and the code receiving the pointer would always just run it as (*funcaddress)(obj, ...). So easy. Nothing to it. So very much terrible C++ code would never have been written if this feature existed.

But it doesn't. There are alternatives, of course - numerous ones, and all terrible, and all incompatible, because the language designers simply failed utterly to do their job. The boost (now TR1) one has the cutest syntax, but God help you if you make a typo using it, because you'll get pages of template gibberish.

Stop and think about that for a second. Template gibberish. For a simple function pointer! Every language not designed by idiots in the last 20 years, including Turbo Pascal, has some kind of function pointers. ASM has function pointers. C has function pointers. This isn't hard. It has nothing to do with making fancy type-independent efficient data structures, for which templates/generics are actually justified. It has to do with a trivial operation that's a basic part of every compiled language: pushing some parameters on the stack and jumping to an address.

While I'm here, no, strings are not "generic" data structures either. The fact that std::string is a template is also incredibly insulting.

Okay, one more example of C++ terribleness. This one is actually a tricky one, so I can almost forgive the C++ guys for not thinking up the "right" solution. But it came up again for me the other day, so I'll rant about it too: dictionary item assignment.

What happens when you have, say, a std::map of std::string and you do m[5] = "chicken"? Moreover, what happens if there is no m[5] and you do std::string x = m[5]?

Answer: m[5] "autovivifies" a new, empty string and stores it in location 5. Then it returns a reference to that location, which in the first example, you reassign using std::string::operator=. In the second example, the autovivified string is copied to x - and left happily floating around, empty, in m[5].

Ha ha! In what universe are these semantics reasonable? In what rational set of rules does the right-hand-side of an assignment statement get modified by default? Maybe I'm crazy - no, that's not it - but when I write m[5] and there's no m[5], I think there are only two things that are okay to happen. Either m[5] returns NULL (a passive indicator that there is no m[5], like you'd expect from C) or m[5] throws an exception (an aggressive indicator that there is no m[5], like you'd see in python).

Ah, you say. But look! If that happened, then the first statement - the one assigning to m[5] - wouldn't work! It would crash because you end up assigning to NULL!

Yes. Yes it would. In C++ it would, because the people who designed C++ are idiots.

But in python, it works perfectly (even for user-defined types). How? Simple. Python's parser has a little hack in it - which I'm sure must hurt the python people a lot, so much do they hate hacks - that makes m[5]= parse differently than just plain m[5].

The python parser converts o[x]=y directly into o.setitem(x,y). Whereas o[x] without a trailing equal sign converts directly into o.getitem(x). It's very sad that the parser has to do such utterly different things with two identical-looking uses of the square bracket operator. But the result is you get what you expect: getitem throws an exception if there's no m[5]. setitem doesn't. setitem puts stuff into your object; it doesn't waste time pulling stuff out of your object (unless that's a necessary internal detail for your data structure implementation).

But even that isn't the worst thing. Here's what's worse: C++'s crazy autovivification stuff makes it slower, because you have to construct an object just so you can throw it away and reassign it. Ha ha! The crazy language where supposedly performance is all-important actually assigns to maps slower than python can! All in the name of having language purity, so we don't have to have stupid parser hacks to make [] behave two different ways!

...

"...Well," said the C++ people. "Well. We can't have that."

So here's what they invented. Instead of inventing a sensible new []= operator, they went even more crazy. They redefined things such that, if your optimizer is sufficiently smart, it can make all the extra crap go away.

There's something in C++ called the "return value optimization." Normally, if you do something like "MyObj x = f()", and f returns a MyObj, then what would need to happen is that 'x' gets constructed using the default constructor, then f() constructs a new object and returns it, and then we call x.operator= to copy the object from f()'s return value, then we destroy f()'s return value.

As you might imagine, when implementing the [] setter on a map, this would be kind of inefficient.

But because the C++ people so desperately wanted this sort of thing to be fast, they allowed the compiler to optimize out the creation of x and the copy operation; instead, they just tell f() to construct its return value right into x. If you think about it hard enough, you can see that, assuming the stars all align perfectly, m[5] = "foo" can benefit from this operation. Probably only if m.operator[] is inlined, but of course it is - it's a template! Everything in a template is inlined! Ha ha!

So actually C++ maps are as fast as python maps, assuming your compiler writers are amazingly great, and a) implement the (optional) return-value optimization; b) inline the right stuff; and c) don't screw up their overcomplicated optimizer so that it makes your code randomly not work in other places.

Okay, cool, right? Isn't this a triumph of engineering - an amazingly world class optimizer plus an amazingly supercomplex specification that allows just the right combination of craziness to get what you want?

NO!

No it is not!

It is an absolute failure of engineering! Do you want to know what real engineering is? It's this:

map_set(m, 5, "foo");
char *x = map_get(m, 5);

That plain C code runs exactly as fast as the above hyperoptimized ultracomplex C++. And it returns NULL when m[5] doesn't exist, which C++ fails to do.

In the heat of the moment, it's easy to lose sight of just how much of C++ is absolutely senseless wankery.

And this, my friends, is the problem.

As with any bureaucracy, the focus slowly shifts from finding a simple, elegant way to solve your problem to just goddamn winning this one battle with the system so that you can get the bloody thing working at all. It would have been easy, at any time, for the C++ committee to have just added a new operator[]=. It would have been totally backward-compatible: any object without an operator[]= would keep working just like it always has.

But they couldn't do that. Doing that would be admitting defeat.

They could have made up a new syntax for sensible member function pointers, any time they wanted. Again, no concern about backwards compatibility - if you don't use it, it doesn't affect you.

They could have written a sensible string class. In fact, people did. Lots of people! But for some reason, they standardized on the non-sensible one. Now C++ users are forever cursed: either you use std::string, and pay endlessly for its suck, or you use your own string class, and be one of those people who constantly gets criticized for designing their own string class.

It is possible to write C++ that's not crap - in theory. This is because it's possible to write C that's not crap, and C programs will compile as C++. Then, you can add a sprinkle of the non-sucky parts of C++ - deterministic construction/destruction (RAII) is one of them - and you'll have a program that's undoubtedly better, more readable, and easier to debug than it would have been in pure C.

But you can't stop there. You should, but you can't. Nobody can. It would be superhuman. Because you'll see something that should be a little clearer, a little easier. Maybe it's string concatenation, maybe it's member function pointers, maybe it's operator[]. But you'll see it, and you'll start trying to solve it. And 1000 lines of code later, you'll have made your life - and the lives of everyone who has to maintain your programs - much worse.

For me it was function pointers. Over the years in wvstreams, I tried doing them so many different ways - using C-style function pointers with wrapper functions, using inheritance and virtual functions, using the insane C++ member function pointers, using templates and the insane C++ member function pointers. Finally, nowadays, function pointers in WvStreams use boost's new functor stuff, which has been standardized by TR1. And every single time I use one, I have to look up the syntax.

For my own library that I've spent the last 12 years building. I have to look up the syntax to declare a callback.

I should have just stuck with plain C function pointers.

Let this be a warning to you.

2010-07-21 »

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.
Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you too. Your moralizing is getting in my way. If you take it out, I can't #include stdio.h, and I can't implement awesome assert-like macros. (Note: see update below.)
Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.
Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().
Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.
Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.
Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.
Do not make it harder to do things in your language than they would be in C. Maybe this isn't even worth mentioning. But the upper bound on the lines of code it takes to do something should be whatever it would take in C. Making your language backward-compatible with C is one way (not the only way) to achieve this.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche. Things you can add to your language to make it better than C without ruining your chances to replace it

Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.
Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway. (Update 2010/07/21: I mean harmless in that it wouldn't bloat the compiled code; it compiles down to the same ASM as the equivalent verbose C code, and if you don't use it, you don't pay for it.)
Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.
Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.
Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.
Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.
One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. (Update 2010/07/21: Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions. The preprocessor is valuable for macros, not for function declarations.)
Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).
Implicit pass-by-reference. I'm totally addicted to the way python passes objects by reference, and only by reference. (Pedants would say it actually "passes references by value." I know the difference. I don't care.) This is probably hard to pull off without garbage collection support, but if you can do it, you'll be my hero. At the very least, let us use reference syntax wherever we might normally use pointer syntax, because requiring us to manually dereference pointers all the time was a mistake. And once you've done that, maybe remove pointer syntax altogether, because it's kind of redundant in C++ to have both. (The only exception is that in C++, you can't reassign what a reference points to. But that's only because they're idiots. Just let me do that, and pointer syntax is entirely obsolete.)
Typesafe varargs. C++ totally failed at this, with utterly awful results (ie. lots and lots of templates that define every version of a function with 1 to n parameters). C varargs are great, but they're not typesafe, and while that's great sometimes, it's less great other times. A simple varargs syntax that coerces all the arguments into a particular type (presumably using your implicit user-defined typecasts from above) would be easy and highly useful.
Lots of other things. This is not a complete list of features you should add to your language. Go crazy! Language design is an act of creativity, and most language features will not make your language unacceptable as a C replacement. Just don't break any of the "must not do" rules up above.

Current C and C++ alternatives and why they aren't popular Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C and adds nothing if you don't use its dynamic typing; and b) the syntax is infernally ugly. Think about this: for all we know, the Linux kernel is actually written using every feature in the kernel-compatible subset of Objective C. Basically it neither wins nor loses. Mu. The D language started out as a good idea, but they went crazy in version 2. Also, they require garbage collection, so they're instantly disqualified. Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified. (This one hurts me deeply, because the other stuff looks so great. But I'm not disqualifying it because I'm subjective, I'm disqualifying it because it just won't do the job as long as it requires garbage collection.) C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge gunky runtime and garbage collection and at least one system thread and it parses XML at startup time - strace it and see! - so no luck. (I left XML out of the "must not do" list because I thought it was obvious. Don't make me regret it.) Java actually fails at every single point in this article. Okay, not really. But they did manage to botch most of it in rather spectacular ways. Any others that I've missed? Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that. P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible. I will, however, switch to your language if it meets all my requirements. So you'll have at least one user. Update 2010/07/21: Wow, this hit the front page of news.ycombinator.com in less than 10 minutes. Thanks, guys. But I see there is some confusion about where I stand on C vs. C++ specifically, and why C++ is not the answer if my question is how to replace C. Good question! The problem with C is that it works but is missing stuff; the problem with C++ is that it tried to add stuff, but the result is hideous. That's a totally subjective evaluation of C++ (see my previous rant for some concrete examples) but it's one that a lot of people seem to agree with. The goal here is to identify the "necessary but not sufficient" rules for creating a C replacement that has a chance of winning. You may hate C++, but it met those criteria, and so it became massively popular, hideous or not. I just want more options; please make me a language that is necessary, sufficient, and not hideous. Update 2010/07/22: People are taking issue with the swearing - and my love of the C preprocessor - in point #2 above. I rephrased it slightly but I'm not removing the swearing. I almost never swear. But this time it matters: removing the preprocessor is treated as a moral issue, but it's also the reason your language design will never replace C, and as a moralizing language designer, you need to know that and make a conscious decision about it. Yes, the C preprocessor is used for all sorts of egregious hacks. I have bad news for you: that's what it's made for. A language that prevents me from making egregious hacks is a language that will, eventually, prevent me from doing my job. Egregious hacks can be used to create portability where there was none; functions where there were none; typedefs where there were none (I'm looking at you, C#). It lets you transparently replace a call to one function with a call to another. It lets you do "#define private public" (one of my favourite C++ tricks) when the maintainer of a library turns out to be an idiot. Yeah, you don't want your language to depend on these hacks in order to let you write a good program; C fails here, C++ fails worse. You want to keep these hacks to a minimum in production code. But if you think all production code should have zero hacks, you are an idealist, and your language design will never win in the niche where C and C++ win. (There is most certainly a niche for languages without hacks. That's not what this article is about.)

2010-07-24 »

A bit more language meditation

After the last couple of bits on C++, I thought I would offer something a little more melodramatic: human languages, as experienced in Montreal. Back in 1887.

Old Montreal, 1887, via the McCord Museum's Flickr Feed

Notice anything funny about this picture?

I'll give you a hint: Montreal is a primarily French-speaking city. Looking at the 2006 census, 13% of the population spoke English as a first language, compared to 54% with French as a first language.¹

...

...Yet every single sign in that photo is in English! If you looked at the same street today, you would see every single sign is in French.²

Why? Because of Bill 101 from 1977, informally known as the Quebec "language law." Among other things, that law says any public sign has to be primarily in French; no other language can be "more prominent" than French.

The mere existence of the language law is, itself, fascinating. It's a flagrant violation of the free speech rights guaranteed by the Canadian Charter of Rights and Freedoms.

But Canadians always hedge their bets, so in addition to free speech, the Charter of Rights and Freedoms also has a section called the Notwithstanding Clause.³ That clause basically says the government can enact any laws they want that violate your rights, as long as they comply with a few basic rules, such as refreshing said laws every few years. (Unlike other Canadian laws, rights-violating ones expire automatically.)

As you might imagine, laws that very literally violate human rights can cause a bit of a fuss. This one certainly does - and it has continued to do so since it was first brought in. The need to refresh it every few years guarantees that it gets back into the news every few years, which is both healthy and stressful.

I'm a native English speaker myself, so this law comes down to racism against me. But you know what? I think it's a good law. 54% of Montrealers speak French as a first language; almost all the rest (even me) can speak at least basic French when necessary.

As the story goes, the reason the rule was needed in the first place was this: while almost every French speaker had learned basic English - after all, the people bordering Quebec in every direction are largely anglophone, so there are lots of chances to learn - the much smaller English population didn't bother to learn French. Because if all the French people are willing to speak English anyway, why bother? And if you're making a sign - even if you're a French person making a sign - are you going to make one that 54% of people can understand, or one that 99% of people can understand? That's right. If you're a wise French business owner serving primarily French customers, you'll make your sign in... English.

Those lazy English people have a point. It really is a lot of work to learn French, just so you can speak French in this tiny little enclave of non-English on a whole continent of English. I find it completely believable that English people are so lazy; my own crappy French skills are testament to that. And quite simply, Quebec's French speaking majority called us on it. They demanded justice: they demanded the right to be served in the language of the majority.

And in order to give people that right - a right not guaranteed by the Charter of Rights and Freedoms - we had to violate another right, namely, the right not to talk to people in French. If it weren't for the magic of the Notwithstanding Clause, the government would be enforcing civil rights... but the wrong ones.

By the way, if you're American, and you come to Montreal and people pretend they can't speak English, you're absolutely right: they are pretending. Because you just rudely jabbed them with hundreds of years of cultural history.⁴ (Note: it's not a pretense in other parts of Quebec, where people often speak exclusively French. And obviously some people in Montreal really don't speak English, but it's fewer than it seems.)

So what does all this have to do with programming?

Well, about that C or C++ or Java. Do you really use it because it's better? Or because that's the one thing everyone can understand, even though it's not actually the best choice for most people? If someone made a law forcing everyone to write stuff in a particular language - say, Objective C - in order to prevent the oppression that is, say, Flash - is that a violation of your freedoms or is someone out there actually protecting you?

Okay. It's a stretch.

Footnotes

¹ I tried to look at the 1861 census (okay, it's a few years off, but whatever), but the statistics defeated me. They weren't surveying people's mother tongue at the time, though they did survey people's birthplaces. At least 48% of the population at the time was of "Canadian - French origin" birth, with about 26% from Britain/Ireland/United States. However, that doesn't account for an additional 25% of "Canadian - Not of French Origin." How much of that is anglophone? I don't know. Perhaps there were more anglophones than francophones in Montreal back in 1861? I don't know. How did it change by 1887? I don't know. This is the sort of information I would like to retrieve from Wolfram Alpha if only it weren't a useless piece of junk.

² You would also notice that Montreal's winter road conditions are about the same as ever.

³ There's also the "Limitations Clause," which says the government can violate your rights, but only if they're consistent and there's a good reason. And don't do it any more than necessary ("minimal impairment"). From the outside, it's hard to believe weird stuff like this works, but the emphasis on using power responsibly instead of blindly following the letter of the law is what Canada is all about.

⁴ Tip: I warn my American friends who visit Montreal that one simple change in behaviour will make your experience vastly more enjoyable. When you start a conversation, any conversation, just saying hi in a store or ordering in a restaurant - do your utmost to start it in French. You know, Bonjour, parlez-vous anglais, mispronouncing stuff off the French side of the menu, whatever. It doesn't matter if you suck at French. You probably won't get more than 5 words out before the person switches to flawless English. Why? Because you acknowledged that they have rights. Imagine if some people from France flew to New York, walked into a restaurant, and refused to speak anything but French. Would you think that was cute? Acceptable? Remotely reasonable? Of course not. You'd think they were idiots. But if you know some French, and they came in and tried their best at English, but had a terrible accent and awful grammar, you'd switch to French as a favour to them. Because they're not being idiots, and you're a nice person. Etiquette really is that easy.

← May 2010

August 2010 →

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com