200801 - apenwarr

2008-01-01 »

Welcome to 2008

I am now officially part of the "previous" generation of programmers. I know this because the new generation no longer has "vi vs. emacs" editor wars, while mine still does.

The new generation's wars are vi/emacs vs. IDEs.

Carry on then.

2008-01-03 »

Welcome to 2008, Part 2: Dietary Information

Please note the recent important changes in dietary advice.

Red meats, which were previously associated with high blood pressure leading to heart conditions, are now okay. It's carbohydrates that are bad. Eat meat, but leave the potatoes at home.

Note that milk no longer "does a body good." In fact, it is now widely believed that people over the age of about 5 years lack the enzymes to digest it properly.

Corn syrup is not, apparently, at the heart of the American obesity problem. This and other exciting "facts" ("Contrary to its name, high fructose corn syrup is not high in fructose") can be found at the Corn Syrup Website.

Saturated fats, the so called "bad" fats that are found in various greasy things like the no-longer-evil red meat, are no longer anything to worry about. Well, maybe they are, but we don't worry about them, because...

"Trans" fats must certainly be much worse, as evidenced by the large number of food packages which now proclaim that they don't contain any. Nobody knows what trans fats are or if they even exist, but because your favourite foods don't contain them, you should feel secure. Phew.

Update: My dad sends this critical additional information:

Nonfat Milk Linked to Prostate Cancer

Bad foods that are actually great for your waist

2008-01-04 »

A thinly veiled rant

Some of the best advice I've ever heard was ostensibly to women about dating, but applies equally to everybody and all their relationships.

If you want to know how a person will treat you once he gets to know you, look at how he treats other people. If you want to know how someone talks about you behind your back, look at how he talks to you about other people.

It's really as simple as that, in life or in business. Don't ignore the signs. Sometimes understanding people is so easy that you can't believe what's obviously true is true.

2008-01-05 »

Welcome to 2008, Part 3: Environmentalism Update

Please note the following changes in environmental terminology. Remember, if you get these mixed up, you'll look old-fashioned.

We used to refer to "the hole in the ozone layer." This hole was reputedly caused by certain chemicals (like our dear departed otherwise-non-toxic freon, now replaced by mildly toxic alternatives) which, when released into the atmosphere, would bind with ozone particles and take them out of circulation. The ozone layer is responsible for "absorbing" certain kinds of dangerous radiation from the sun and turning them into "harmless" heat.

At the same time, there were warnings about an excess of "greenhouse gases" and the related problem of acid rain. At the time, the majority of activism was toward reducing emissions of various nasty particles like carbon monoxide, methane, and sulphur. Natural Gas was described as the "clean alternative fuel", because all it releases (when burned efficiently) is carbon dioxide.

Greenhouse gases work like this: the sun's radiation is partly absorbed by the earth, and partly reflected back. Greenhouse gases tend to absorb more of the reflected light, trapping it in the atmosphere instead of letting it escape, thus increasing the temperature.

Ironically, ozone is a greenhouse gas. The "hole in the ozone layer" prevents certain types of radiation from being absorbed and safely converted into harmless heat. Other greenhouse gases absorb other wavelengths of radiation, converting it into dangerous heat. Got it? Good.

We don't talk about the ozone layer or greenhouse gases anymore. Instead, we talk about "carbon emissions," by which we mostly mean "carbon dioxide emissions." Carbon dioxide is what you produce when you breathe. After you clean up your artificial pollution-spewing devices, carbon dioxide is pretty much all that comes out. Other than its contribution as a greenhouse gas, it is harmless.

So the question is: why do we hear so much now about "carbon emissions" instead of "greenhouse gases" in general, or acid rain, or the ozone layer? Is it good news, and the other problems are mostly solved? Or do we as a society just fixate randomly on the most recent problem that someone famous has made a movie about?

2008-01-07 »

More biases

I've written several times before about different kinds of statistical biases. I care a lot about that since, next to actual incorrect facts, the most common source of wrong decisions seems to be a misguided use of so-called statistics.

Here are two great articles about bias. The first is about the Anchor Bias:

They spun a roulette wheel and when it landed on the number 10 they asked some people whether the number of African countries was greater or less than 10 percent of the United Nations. Most people guessed that estimate was too low. Maybe the right answer was 25 percent, they guessed. The psychologists spun their roulette wheel a second time and when it landed on the number 65, they asked a second group whether African countries made up 65 percent of the United Nations. That figure was too high, everyone agreed. Maybe the correct answer was 45 percent.

Isn't that amazing?

I claim, by the way, that people like Ayn Rand and Richard Stallman have to exist simply because they help de-anchor-bias others. "100% of software should be free?! Holy cow, you're crazy. Maybe more like 90%."

Meanwhile, Peter Norvig, who is (if I understand correctly; I'm offline as I write this) one of the Google researchers working on their PageRank statistics, wrote a great article about different kinds of bias in both experimental design and the interpretation of results.

It's long, but scroll down to section I4 and find the surprising answer to this question (via Eliezer Yudowsky):

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammograms. 9.6% of women without breast cancer will also get positive mammograms. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

It is not a trick question, but my answer was completely wrong. Think about it, then follow the link and check your answer in section I4.

2008-01-09 »

Why I Never Hire Brilliant Men

And now for something completely different: an article from 1924 called "Why I Never Hire Brilliant Men."

I find it's a very pleasant read. The soft tone is something we've sadly lost in our modern world of hyper-sensationalized "Top 5 blah blah" blog headlines. I'd like to be able to write like he did; no such luck, but at least I haven't resorted to the "top 5" yet.

As a bonus, the article also makes many good points about hiring.

2008-01-13 »

DemoCampCUSEC2 in Montreal

If all goes well, I'll be presenting at DemoCampCUSEC2 in Montreal. I was a little late signing up, but the organizers claim there's still time. I hope so.

I should be demonstrating my wild combination of Nitix, VMware, and a few other things, showing how to get an entire database-driven Windows application, including the Windows it runs on, deployed 15 minutes or less. Come watch!

2008-01-16 »

This post is not about Macbook Air

Yes, this is the so-called "blogosphere," and yes, people in said "blogosphere" tend to start meme-of-the-moment posts with statements like "I promised I wouldn't talk about such-and-such in this blog, but..."

I've done the same.

But not this time.

This time, I simply didn't write a post about the advantages and disadvantages of the feature selection in the Macbook Air. Or whether I plan to buy one, or how cool or not cool it is.

See how much restraint I have?

2008-01-18 »

And that, as they say, is that

Goodbye, NITI, hello, IBM!

For those just joining us: I founded NITI but I don't work there anymore.

However, I can now safely say from firsthand experience that while some people are demonstrably evil, at least some VCs are actually not. I suppose anti-VC sentiments are a form of racism; evilness and incompetence are traits that turn out to be independent from VCness.

2008-01-21 »

"git checkout" is faster than "cp -a"

It's true. I've determined this experimentally. And it makes sense, too: if you've used "git-repack" on your repository, then you have a nice, compressed, sequential file that contains all the data you're going to read. So you read through it sequentially, and write into the disk cache. Up to a certain size, there's no disk seeking necessary! And beyond that size, you're still only seeking occasionally to flush the write cache, so it's about as fast as it gets.

Compare to "cp -a", where for each file you have to read the directory entry, the inode, and the contents, each of which is in a different place on disk. The directory is sequential, so it's probably read all at once and doesn't need a seek. But you still have about two seeks per file copied, which is awful.

Even if your disk cache already contains the entire source repository, copying files requires more syscalls (= slow) than reading large sequential blocks of a single huge file. In other words, even with no disk access involved, git-checkout is still faster than "cp -a". Wow.

In related news, check out this funny mailing list discussion from 2005, in which Linus defends his crazy ideas about merging. It reminds me of the famous "Linux is obsolete" discussion from back when Minix was clearly going to rule the world. Actually, it reminds me rather disturbingly of that, and the results we see now are very similar.

Here's an excellent discussion of some of the brilliant design behind git.

Yes, I have become a true believer. The UI consistency needs work, though. The feature list grew really really fast, and it shows.

2008-01-22 »

Don't forget the muppets

This posting about entertaining your audience using muppets could not help but remind me of dcoombs's "Finger Puppet Theatre" guide to cryptography.

2008-01-23 »

Voting and fairness

Some people think that in a democracy, getting more people out to vote is the most important thing. People who don't vote, they say, are "disenfranchised" - they don't think their vote will make a difference and they don't believe in the system - and this is a problem to be solved.

There is a problem here that needs to be solved, but I only wish it were so easy.

I rarely vote in national or provincial elections, but it's not because I'm disenfranchised. In fact, you could say I'm overly franchised: I feel in tune with other people to a great enough degree, and I have enough faith in the system that I believe it will work just as well with my help as without it. I test this theory with every election; I decide who I would like to win, and then I see if they win (both in my riding and "overall").

Usually the result is pretty close to what I see as the local optimum. Stephen Harper, for example, wasn't such a great choice of Prime Minister, but he was a better, more balanced choice than the other options we'd been offered. And pretending to be conservative for a while will give Canadians a much better understanding of what we really want next time around.

The reason this comes to mind is not the upcoming U.S. elections (if I was an American, I would certainly be severely disenfranchised, and I'd need much more than platitudes to fix it), but because I'm thinking about the culture of software companies.

Basically, I see most programmers like this: they want to program, and they don't want to be distracted. They want control over their destiny, but they don't want to micromanage their destiny. They want to be paid fairly, and that's more important than being paid hugely.

For programmers, neither selling your soul to a megacorporation in exchange for stability, nor entering into the nonsense of a startup in exchange for control and a 10% chance of striking it rich, is a good solution.

The thing that makes Canada's political system feel safe and fair is that I can vote, not that I do vote. It's much more about good options being available than about constantly trying to choose which one I want.

The question is, how to build a company around that principle?

2008-01-25 »

A note on objectivity

If you can feel it, it doesn't matter if it's true.

2008-01-31 »

Git is the next Unix

When I first heard about git, I was suspicious that there could be anything special about it, but after watching Linus' talk about it, I was... even more suspicious. I tried it anyway.

When I tried it, I realized something right away: what made git awesome was actually none of the things Linus had talked about, not really. Those things were more like... symptoms of the underlying awesomeness. Yes, git is fast. Yes, it is distributed. Yes, it is definitely not CVS. Those things are all great, but they miss the point.

What actually matters is that git is a totally new way to operate on data. It changes the game. git has been described as "concept-heavy", because it does so many things so differently from everything else. After some reflection, I realized that this is far truer than I could see at first. git's concepts are not only unusual, they're revolutionary.

Come on, revolutionary? It's just a version control system!

Actually it's not. Git was originally not a version control system; it was designed to be the infrastructure so that someone else could build one on top. And they did; nowadays there are more than 100 git-* commands installed along with git. It's scary and confusing and weird, but what that means is git is a platform. It's a new set of nouns and verbs that we never had before. Having new nouns and verbs means we can invent entirely new things that we previously couldn't do.

Git is a new kind of filesystem, and it's faster than any filesystem I've ever seen: git checkout is faster than cp -a. It even has fsck.

Git stores revision history, and it stores it in less space than any system I've ever seen or heard of. Often, in less space than the original objects themselves!

Git uses rsync-style hash authentication on everything, as well as a new "tree of hashes" arrangement I haven't seen before, to enforce security and reliability in amazing ways that make the idea of "guaranteed identical every time" not something to strive for, but something that's always irrevocably built in.

Git names everything using globally unique identifiers that nobody else will ever accidentally use, so that being distributed is suddenly trivial.

Git is actually the missing link that has prevented me from building the things I've wanted to build in the past.

I wanted to build a distributed filesystem, but it was too much work. Now it's basically been done... in userspace, cross-platform.

At NITI we built a file backup system using what was a pretty clever data structure to speed up file accesses. But we never got around to implementing sub-file deltas, because we couldn't figure out a structure that would do it both quickly and space-efficiently. With git, they did. To build your own backup system that's much better than ours, just store it in git instead.

On top of our backup system we made a protocol for synchronizing changes up to a remote repository. Our protocol was sort of okay; git's is much better, and it will surely improve a lot in the months ahead. (Currently git requires you to sync everything if you want to sync anything, but that's an implementation restriction, not a design or protocol restriction. See shallow clones for just the beginning of this.)

Someone else I know built a hash-indexed backup system to efficiently store incremental backups from a large number of systems on a single set of disks. Git does the same, only even better, and supports sub-file deltas too.

We made a diskless workstation platform called Expression Desktop (now very dead). Knowing disks were cheap and getting cheaper, we wanted to make it "diskful" eventually, automatically syncing itself from a central server... but able to guarantee that it matched the server's files exactly. We couldn't find a protocol to do it. git is that protocol.

I built a system on top of Nitix, called Versabox, that let you install a Linux system on top of a Nitix system without virtualization. I wanted a way to make it easy to install software into that Linux environment, then repackage the entire thing as an all-in-one installer kit, but have the archive contain both the original package and the new content; that way you could upgrade either part without touching the other. To do that I invented a new file format and tool, called versatar. It works, and we use it at my new company. But git would do it much better, and includes digital signatures too for free.

Numerous people have written diff and merge systems for wikis; TWiki even uses RCS. If they used git instead, the repository would be tiny, and you could make a personal copy of the entire wiki to take on the plane with you, then sync your changes back when you're done.

When Unix pipes were invented, suddenly it was trivially easy to do something that used to be really hard: connect the output of one program to the input of the next. Pipes were the fundamental insight that shaped the face of Unix. Programs didn't have to be monolithic.

With git, we've invented a new world where revision history, checksums, and branches don't make your filesystem slower: they make it faster. They don't make your data bigger: they make it smaller. They don't risk your data integrity; they guarantee integrity. They don't centralize your data in a big database; they distribute it peer to peer.

Much like Unix itself, git's actual software doesn't matter; it's the file format, the concepts, that change everything.

Whether they're called git or not, some amazing things will come of this.