100% Pure

accept no imitations
Everything here is my opinion. I do not speak for your employer.
March 2008
April 2008

2008-03-25 »

A tale of five merges, part 4: 'git rebase'

So far, we've talked about branch merging concepts in CVS, SVN, and git. Of the three, git's is by far the most convenient and magical... when it does what you want. When it doesn't do what you want, you find yourself in a bit of trouble.

Rebasing git branches

Last time, we brought up one example of a major problem with the "git merge" command: if you create BRANCH1 from MASTER, and then BRANCH2 from BRANCH1, and you merge BRANCH2 into MASTER, then you end up merging BRANCH1 into MASTER as well by accident. That's because "git merge" finds the first common ancestor of BRANCH2 and MASTER, which is MASTER, and merges all the changes from there to BRANCH2. In between those two points lies all of BRANCH1.

This is exactly the problem that "git rebase" was designed to solve. Here's what you do:

  1. git checkout BRANCH2
  2. git rebase --onto MASTER BRANCH1

What the above command does is re-apply all the changes from BRANCH1 to BRANCH2 onto MASTER. Sound familiar? Yes, that's right, it's exactly what "cvs up -j BRANCH1 -j BRANCH2" would have done. And with "git rebase", you specify revisions just as manually as you would with CVS or SVN.(1)

Here's the problem: the rebase command generates entirely new patches onto an entirely new branch. The old BRANCH2 is gone(2), and the new BRANCH2 has a bunch of commits on it that look similar to a person, but as far as git knows, have nothing in common with the old ones. If you've ever shared your old BRANCH2 with anyone, they will no longer be able to merge to and from your new BRANCH2. If you use git-rebase, the only sane thing you can do is throw away all the pre-rebase versions of BRANCH2... and that's tricky, since people all over the world might have a copy. They might even have merged it with their own branches. Oops.

This combination of inconveniences, by the way, is the absolute most frustrating problem with using git-svn. Every time a new revision comes in from SVN, you need to git-rebase your patches on top of it, screwing up all git branches in the entire repository.

The good news is that you can continue to work on your new BRANCH2, and when you merge it (with "git merge") back and forth with MASTER, at least you won't have a problem.

Interactive Rebase

Another concern that people run into with git's merging style is more about their sense of aesthetics. git makes it really easy to create local branches and check in frequently - it even encourages this, since it has such easy merging and branching and such fast checkins.

The problem is if you take advantage of this, you end up checking in code every ten minutes, and sometimes it doesn't work. When you later want to prepare a set of patches for submission to an upstream maintainer, you don't want to send in 57 patches, none of which work independently. You probably want to send maybe three patches, each of which adds a separate feature.

For this, git introduces "git rebase -i", known as "interactive rebase." I won't go into detail, since the git-rebase man page is detailed enough. But the idea is that you can take your original patchset and shuffle it around and split and join patches however you want, making it look like that's what actually happened as you developed your project. So if you added feature #1, then feature #2, then realized there was a bug in feature #1 and fixed it, then fixed another bug in feature #2, you can join patches #1 and 3, and then #2 and 4, so it looks to the world like you just did everything perfectly the first time. As a bonus, when people are doing code reviews of your patches, they won't have to review code that you already knew was broken. That ought to save some time.

Interactive rebase is an innovative feature that certainly isn't available in something like CVS or SVN, and it can be addictive. Of course, before you have it, it's hard to imagine that you'd need it. Personally, I find myself reshuffling my patches a bit unnecessarily just for the aesthetics of a clean repository where each patch does what it says. This is both fun and time consuming, and it's debatable whether it's worth it or not.

To some degree, the reason interactive rebase is popular is that "git merge" history is too detailed. In svn, you'd make your mess on a branch, get things all cleaned up, and then do a one-shot merge from your branch into the trunk. The merge would just say "added features X and Y." Nobody was forced to look at the inevitable long string of screw-ups that actually led to features X and Y getting added. Interactive rebase exists primarily to get back the feeling of cleanliness that SVN had and git took away.

Git rebase summary

For each merging system, we've been asking: (A) how automatic is it, (B) what happens to the change history, (C) what happens when you merge back and forth multiple times between two branches, and (D) what happens when you "cherry pick" individual changes, usually bugfixes, from one branch to another.

"git rebase" is unusual for a merging system, but it's basically a paraphrased version of old-style CVS and SVN merge commands. Thus, the answers it gives to these questions are roughly the same: (A) it's semi-automatic, as long as you provide the right revision numbers; (B) it's mostly used to make the change history look cleaner, at the expense of losing the detailed history git-merge could provide; (C) it actively complicates merging back and forth between branches, and produces more conflicts than CVS or SVN would have; and (D) it's highly compatible with patch cherry picking, since that's pretty much all it does.

In short, a combination of "git merge" and "git rebase" either gives you the best of both worlds, or the worst of both worlds, depending on your point of view. But having to choose between them, and trying to figure out what crazy thing happened when you chose the wrong one, is one of the scariest and most confusing parts of git as far as new users are concerned. In fact, for me, it's much harder to sort out the combination of "git merge" and "git rebase" than it is to figure out two-way merging in CVS... and that's saying a lot.

Next time: darcs and patch theory.

Footnotes

(1) One cool thing about "git rebase", however, is that it keeps your original patches as separate checkins. With CVS or SVN, you would get one big checkin with a merge of all your patches; with "git rebase", it keeps your patch series intact. There is no technical reason that CVS or SVN couldn't do the same trick, though.

(2) Usually you don't want to actually delete the old BRANCH2, just in case you screw up. Perhaps you'll call the new one BRANCH2a. But you'll still never be able to merge between BRANCH2 and BRANCH2a, even though they contain the same set of changes.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com