I love the smell of

in the morning
Everything here is my opinion. I do not speak for your employer.
March 2008
April 2008

2008-03-21 »

A tale of five merges, part 2: Subversion (svn)

Last time, I introduced the confusing but definitely workable concept of merging branches with CVS.

Of course, CVS is completely obsolete nowadays, having been wholeheartedly replaced by Subversion. SVN really does do everything CVS can do, plus more, and it includes awesome CVS import tools and clients for every platform, so if you're still using CVS, do yourself a favour and just switch to SVN right away. I mean it.(1)

As a refresher, the questions we're investigating for each merging system are: (A) how automatic is it, (B) what happens to the change history, (C) what happens when you merge back and forth multiple times between two branches, and (D) what happens when you "cherry pick" individual changes, usually bugfixes, from one branch to another.

Merging with 'svn merge'

If you understood what I said about CVS last time, it won't take you much work to understand merging in SVN. The key things to note are: - SVN numbers each commit automatically, and uses its commit numbers like CVS uses tags. - SVN tags, conversely, are almost nothing like CVS tags. (They're conceptually more like making new repositories altogether.) - 'svn merge' is essentially the same thing as 'cvs up -j -j', only with more annoying syntax.

I won't go into much detail here, as conceptually, merging in SVN is pretty much the same thing as merging in CVS, and is covered excellently in the SVN Book.

The strangest thing about merging with SVN is that even though SVN supports tags, they are mostly worthless for merging purposes. Instead, people use svn "tags" as branches and just merge based on revision ids. Where in CVS you would "cvs up -j BEFORE -j AFTER", you would now "svn merge -rBEFORE:AFTER that_other_repository", where BEFORE and AFTER are magic numbers and that_other_repository is the place in SVN where you've been working on your branch.(2)

As with CVS, repeated merging and bidirectional merging are both entirely possible, as long as you remember which revision numbers correspond to BEFORE and AFTER. Usually you do this by putting the revision numbers (by hand) into your commit messages, which is gross, but I guess no more gross than manually creating tags in CVS.

Merging with 'svnmerge'

'svnmerge' is a standalone program that is not officially part of svn. It is not to be confused with 'svn merge' (above), which is a command built into svn itself.

The purpose of 'svnmerge' is to get rid of the grossest parts of SVN merging, namely the fact that humans have to remember and type SVN revision numbers to do a merge, as well as repeatedly type the stupid that_other_repository path.

Simple advice: If you're using SVN, you should probably be using svnmerge for your merges.(3)

Unfortunately, one thing that svnmerge doesn't do easily is bidirectional merging. If you're working in a feature branch and occasionally merging into your branch from /trunk (the same thing as HEAD in CVS), you can use svnmerge to do it. But when you go to merge your feature branch back into /trunk, svnmerge will get horribly confused, and you'll have to do it using plain 'svn merge'. It's not the end of the world, but you have to know what you're doing.

On the other hand, svnmerge adds support for "cherry picking", or choosing to merge individual revisions from one branch into another. You can use this to backport bugfixes from /trunk into a patch release, for example. (Rumour has it that the SVN developers themselves use this technique a lot.)

How the SVN changelog tracks merges

SVN has what is called a "linearized" view of history. Every repository has a sequence of checkins, and each checkin affects one or more branches. So you can view the changes to a particular branch over time (by showing the sequence of checkins, and filtering out the ones that didn't affect your branch). You can also view the list of all the checkins, regardless of branch, and see a linear history of "what people did over time."

Perfect, right?

Well, not quite. The problem is that merge history isn't very easy to view. If you view the history of a branch, one of the checkins you see will be something like "merged changes from /branches/whatever, r3456:3573". If you're smart and use the 'svnmerge' tool, the changelog of r3456:3573, filtered for /branches/whatever, will be appended to the message. But if you want to know exactly what happened in r3458, then what happened in r3504, and then what happened in r3573... you'll have to ask about those revisions on /branches/whatever, not on your destination branch. And the only way you know that is by reading the freeform English message in the changelog. That's subversion's "linearized view" in action: as far as SVN is concerned, that branch just saw one checkin, the merge checkin. All those other details happened on some other, totally unrelated branch.

This is actually both good and bad, because it turns out that sometimes you want the detailed history, and sometimes you want the summary version.

For example, say I'm adding feature X in a branch called /branches/featureX. It takes me 57 commits, including a few merges to catch up with /trunk, before my changes are ready to go back into /trunk. Everyone else working on the project will be very interested to know that I "Merged /branches/featureX into the trunk" - as well as the precise set of changes that introduced - but they almost certainly don't care about my 57 commits in detail. For most humans, the preferred view of the trunk is SVN's linearized view.

That is, until it's time to track down which of those commits introduced a bug, or which lines of code were contributed by that guy who was copying source code from SCO, or who I should ask about this particular new function, or whatever. In SVN, you can do "svn annotate" to find out which revision a line was last modified in, and who did it... but it almost always turns out to have been modified by some guy who merged the changes from some branch, so you have to manually go look in that branch, and so on.

Next time, when we talk about git, we'll see how the opposite solution is bad for the opposite reasons.

SVN overall

Let's answer our original questions.

SVN: (A) is not very automatic at merges, albeit better than CVS, and 'svnmerge' makes it significantly better; (B) retains change history across all branches, but the detailed history of a merge must be traced manually to its source branch; (C) like CVS, supports back-and-forth merges with no more trouble than single merges; and (D) has okay support for cherry picking if you use 'svnmerge --bidirectional' carefully.

Next time: how 'git merge' works.

Footnotes

(1) Or switch to git, which someday will probably be the worldwide successor to svn. The road there is much bumpier, though, as git is much more confusing than SVN and does not (yet?) have a strict superset of SVN's features (it has some new things, but is missing some old things). I'll get to that later in this series.

(2) Ironically, that_other_repository must actually be just another path in your current repository, even though SVN forces you to specify the whole @#$#$ URL every single time. This wouldn't be so insulting except that I can't think of any good reason why it has to be in the same repository; the SVN merge algorithm should work just fine if the changes exist in some totally different SVN repository, and even the command-line syntax is designed to permit this.

(3) Rumour has it that Subversion 1.5, when it's released, will integrate svnmerge-style functionality into the main subversion system, as well as improving "svn log" and hopefully "svn annotate" so they can trace back through merges.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com