A tale of five merges, part 3: 'git merge'
In the last two articles from this series, I talked about merging branches
in CVS and SVN, and how they're largely the same.
Today is the first of two articles about merging in git. git is most definitely not the
same as CVS and SVN.
Remember, the questions we're investigating for each merging system are: (A)
how automatic is it, (B) what happens to the change history, (C) what
happens when you merge back and forth multiple times between two branches,
and (D) what happens when you "cherry pick" individual changes, usually
bugfixes, from one branch to another.
Merging with 'git merge'
Compared to working with CVS or SVN, merging branches in git feels like
magic. First, you check out the branch that you want to merge into, say
MASTER. Then you type "git merge BRANCH", where BRANCH is the name
of the branch you want to merge from. And that's all.
...When it works.
Aha. You see, "git merge" is really just doing exactly what you were doing
in CVS or SVN by hand. That's why I went into CVS merging in such detail
before.
Here's what really happens: given the name BRANCH and the implicit name
MASTER (since that's the one you have checked out), git traces back in the
history of both until it finds the branchpoint, that is, the last commit
before the two diverged in the first place.(1) If we were using CVS, we'd call
that point BEFORE, and BRANCH would be called AFTER. git then simply takes
all the changes from BEFORE to AFTER and adds them to MASTER.
Now here's where things get especially clever. Remember when we were
discussing bidirectional merges in CVS, and I pointed out that merging all
the changes from BEFORE to AFTER into HEAD was the same as merging all the
changes from BEFORE to HEAD into AFTER? The resulting tree is the same in
both cases, but of course, in CVS and SVN, the actual commit you make is
different in each case. After all, in one case, you're committing to the
HEAD branch, and in another, you're committing to the AFTER branch.
git is smarter than that. In git, there really is no difference between
the two operations. You generate a single commit that has two
parents: AFTER and HEAD (or BRANCH and MASTER). Now, it so happens that
git updates the MASTER branch pointer to point at your new commit in one
case, or the BRANCH branch pointer in the other case, but the commit itself
is exactly the same in both cases.
And that simple fact is why repeated merges and bidirectional merges work.
Next time you're merging MASTER and BRANCH, git will trace back, and the
most recent checkin that is shared between the two branches will be one of
the parents of that magical two-parent merge checkin. We use that one as
BEFORE, and merge the changes from BEFORE to BRANCH into MASTER. Or else we
merge the changes from BEFORE to MASTER into BRANCH; again, it's the same
thing.
This concept of a two-parented commit is what makes git merging totally
different from SVN, and it's the difference between SVN's "linearized
history" (in which every commit has exactly one parent, even if we made it
that way artificially) and git's non-linear history.(2) It's
also the reason git merging can be automatic, while svn's cannot.
Why it doesn't always work
The problem with git's automatic merging is, well, that it's automatic.
When it's working for you, it's amazingly powerful. But people who are used
to CVS or SVN merging sometimes feel like git is taking their power away.
For example, perhaps you don't want to merge all the changes from
BRANCH; perhaps you only want the changes up until sometime last week. With
'svnmerge' this is easy enough to do, but with git it requires multiple
steps, since git-merge won't accept arbitrary revisions on its command line,
only branch names.(4)
Or say you create BRANCH1 from MASTER, and start to add feature #1. Then
you create BRANCH2 from BRANCH1, and add (unrelated) feature #2. Now you
decide that feature #2 is ready to merge back into MASTER, but feature #1 is
not. No problem, right? You checkout MASTER and then type "git merge
BRANCH2".
Oops! git has just merged all the history of BRANCH1 *and* BRANCH2
back into MASTER, which is not what you wanted at all! Unfortunately,
because git's merging is fully automatic, there is no easy way to get the
behaviour you wanted.(3) With CVS, on the other hand, what you
would do is obvious: "cvs up -j BRANCH1 -j BRANCH2", to merge all the
changes from BRANCH1 to BRANCH2 into MASTER. With SVN, it would be similar,
except that BRANCH1 and BRANCH2 involve revision numbers and URLs.
In git, the solution to this sort of problem is to use the "git rebase"
command, which we'll discuss in more detail next time. But any use of
git-rebase or git-filter-branch, both of which are very useful,
fundamentally change the commit IDs on the modified branch. "git push" and
"git pull" stop working as expected. Essentially, these commands generate
entirely new commits for the part of the tree they change, so now there are
two sets of commits in your repository that do the same thing in two
different ways, with nothing tying them together. Worse, after a rebase, the
point where your two branches diverged might be no longer known to git -
they no longer have a single commit id in common - so it just gives
up.(5) And trying to merge the new MASTER back into the original
BRANCH2 later will result in a ton of conflicts.
The really bad news is that git-svn, which is otherwise a great tool to help
with migrating between git and svn, uses git-rebase pretty heavily. That
confuses git's automated merging features altogether, and git's
manual merging features are not as straightforward as SVN's, so
things get confusing very fast.
Git merge and cherry picking
git includes another command called "git cherry-pick". Its job is to take
the changes from one specific commit and apply them to the current
branch.
Unfortunately, the existence of git cherry-pick is incompatible with git
merge's view of the world. git's history may not be linear, but at
least it's continuous: branches may swerve into and out of each other
as they pass through time, but individual changes aren't supposed to simply
jump from the middle of one tree into the middle of another.
Since there's no way to represent what really happened, a cherry-picked
patch instead produces an entirely new commit with exactly one parent (the
destination branch), so a future git-merge from that branch has no idea the
patch already existed, and usually produces a conflict.
Cherry picking in SVN with 'svnmerge' definitely works better than this.
How the git changelog tracks merges
Last time around, I talked about how SVN's "linear" change history tracks
merges, which is simple enough: every time you do a merge, it puts a single
checkin into the destination branch which contains all the changes. And if
you want to know the details of every patch from the source branch, well,
you'll have to go look at the source branch yourself.
git definitely doesn't do that. Instead, if you ask for "git log" it will
show you the complete set of changes in all branches leading up to the
current commit. That is, all of the checkins to either of the
merged branches show up in the log, with nothing in particular in the "git
log" output identifying which patch came from where.(6) This
is technically correct, but is not actually what most users wanted to
know.
Most people who ask for the history of the MASTER branch want to know the
"simplified" form of its history: Added feature X; added feature Y; fixed
feature X; added feature Z. But instead, they confusingly see all the
individual 57 commits making up "added feature X", even though if they were
ignoring your featureX branch all along, "added feature X" was really a
one-time event for them.(7)
A related problem is that git doesn't have a way of naming branches
globally. So if you merge from the "featureX" branch, you're not really
merging from featureX; you're merging from
2bed2ad4845eceb8ee650f34e476a60f9fbecc7c. As far as the computer is
concerned, that's the same thing, but when it comes to tracing back through
your history, it's not actually what you wanted to know.
Worse still, git doesn't remember at branch time where you diverged;
it only figures it out at merge time. That means there is no
equivalent to SVN's "svn log --stop-on-copy", which shows only the changes
you added in this branch.(8)
git-merge overall
Now let's go back to the questions we were trying to answer in the first
place:
git-merge: (A) does merges fully automatically, when it works, but
complicates manual merges; (B) retains the history of each change, but that
often turns out to be more than you wanted; (C) easily supports
back-and-forth merges with no more trouble than single merges; and (D) does
not really support cherry picking without introducing future merge
conflicts.
Phew!
Next time: how 'git rebase' tries to solve some of the things people don't
like about git merge.
Footnotes
(1) This is actually greatly oversimplified. Since git history
is nonlinear, just tracing back through it is a bit complicated. On top of
that, you can play tricks with selecting different branchpoints for
different files, and so on.
(2) I guess if SVN's history is "linear", then git's history is
"helix shaped," with endless splitting and remerging of histories.
(3) git proponents would say that what you should have done is
branched BRANCH2 from MASTER in the first place. But not everyone is lucky
enough to get everything right the first time.
(4) There seems to be no actual reason for this restriction.
Perhaps it will go away eventually.
(5) That's the source of the mysterious note that "You should
understand the implications of using git rebase on a repository that you
share" in the git-rebase man page. Of course, the man page doesn't really
tell you what the implications are. It just says it "will cause problems."
(6) The "gitk" and "gitweb" programs attempt to represent this
graphically, which helps. The "git show-branch" program is also somewhat
useful here. But all of these are much harder to understand than "svn log".
(7) There is actually a "--squash" feature of git-merge that
tries to resolve this problem. Essentially it creates an entirely new
commit with only one parent - the destination branch - and doesn't tie in at
all to the history of the other branch. In other words, it does exactly
what "svn merge" would do, but that has exactly the opposite set of
tradeoffs, as discussed last time, and makes bidirectional merges very ugly.
Alternatively, "git log --first-parent" displays only the first parent of
each merge, which is closer to what most users actually want.
(8) If you manually keep track of which branch you came from, you
can use "git log BEFORE..MYBRANCH" to see the list of changes between BEFORE
and MYBRANCH. But if you're going to manually keep track of BEFORE, you
might as well be using CVS. "git show-branch" also attempts to help here,
but if you ever use "git rebase" its output will be hopelessly cluttered and
confusing.
March 20, 2008 17:26