2008-01-21 »
"git checkout" is faster than "cp -a"
It's true. I've determined this experimentally. And it makes sense, too: if you've used "git-repack" on your repository, then you have a nice, compressed, sequential file that contains all the data you're going to read. So you read through it sequentially, and write into the disk cache. Up to a certain size, there's no disk seeking necessary! And beyond that size, you're still only seeking occasionally to flush the write cache, so it's about as fast as it gets.
Compare to "cp -a", where for each file you have to read the directory entry, the inode, and the contents, each of which is in a different place on disk. The directory is sequential, so it's probably read all at once and doesn't need a seek. But you still have about two seeks per file copied, which is awful.
Even if your disk cache already contains the entire source repository, copying files requires more syscalls (= slow) than reading large sequential blocks of a single huge file. In other words, even with no disk access involved, git-checkout is still faster than "cp -a". Wow.
In related news, check out this funny mailing list discussion from 2005, in which Linus defends his crazy ideas about merging. It reminds me of the famous "Linux is obsolete" discussion from back when Minix was clearly going to rule the world. Actually, it reminds me rather disturbingly of that, and the results we see now are very similar.
Here's an excellent discussion of some of the brilliant design behind git.
Yes, I have become a true believer. The UI consistency needs work, though. The feature list grew really really fast, and it shows.
Why would you follow me on twitter? Use RSS.