When there's only one

...there's only one choice
Everything here is my opinion. I do not speak for your employer.
June 2008
July 2008

2008-06-24 »

Weird things about git, #1: premature optimization

I've been hanging out on the git mailing list for the last little while, and a few things have been striking me as weird. There are hundreds of people on that list, and lots of them are active contributors. Why? It's just a stupid version control system after all.

But there must be some reason. I've thought of a few. Here's one for today:

The git developers are completely obsessed with performance. Apparently nobody there has ever heard that "premature optimization is the root of all evil." (Incidentally, the ACM has a recent article explaining why that statement is a fallacy.)

It's enlightening just to watch the git developers optimize every possible part of their system: syscall overhead, memory allocation, path trees, file size, network turnarounds, etc. And even while they optimize the heck out of everything, they still complain about how inefficient it all still is. Which it is, of course, if you follow the discussions.

Also interesting is that large parts of git are written in sh and perl and it's still fast, because while they obsess about performance, they also know which parts actually matter to performance. It helps to be a Linux kernel developer sometimes, I guess.

There's no doubt that git's optimization is premature: it started out way faster than svn, and it gets even faster with each release. Is that really necessary? Of course not. Everyone was doing just fine with things like svn. But life is just so much better when programs go unnecessarily fast. It's a very strange sensation; pain you didn't realize you were having just goes away. I've been doing a lot of Windows development lately, and let me tell you, Windows developers have a whole lot of pain that they don't realize they have, because it's not overt. Windows development is mostly fine, really. But what kills it is all the little random delays, the crashes, the giant memory-leaky .Net-based IDEs that make you point-and-click fifty things sequentially once a week because you don't have a way to write a script to do it for you. Just give me a text editor and 'make' any day, thanks.

What was my point again? Oh right, unnecessary optimization. While I've been writing this article, I've been waiting for svn to finish committing 20000 tiny files to a fresh svn repository. It's still not done. (Watching a strace of this is hilarious. Among other things, it's creating an XML file for every single one.)

...

And for comparison, git just did the same thing in 23 seconds. (About half that time was spent running my shell script to create 20000 files.)

Who cares? How often do you create a repo with 20000 tiny files in it anyway? Almost never, of course. It's not an important optimization goal.

...

svn finally finished. See? That wasn't so bad.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com