Merging in bzr: The git approach vs the bzr approach

(or, why explicit merges, a mainline, and merge direction are important)

This article compares the merge styles of bzr and git. It visually demonstrates what happens when using explicit merges with a mainline versus fast-forward merges without a mainline. The difference is partly technical (git has no concept of a mainline) and partly cultural (git people prefer a different pattern than bzr people).

Note: The bzr and git examples below are both implemented using bzr and displayed with bzr-gtk. This article is about visually comparing two different development styles and the norms of two different cultures, not about the actual tool implementations.

Bzr-style development

In bzr, the usual development process looks something like this:
  1. Branch.
  2. Hack, hack, hack. Commit frequently.
  3. Submit a branch (url or bundle) for review.
  4. Hack, hack, hack. Fix review issues.
  5. Merge from trunk to resolve conflicts.
  6. Submit a branch (url or bundle) for review.
  7. Merge into trunk.
Merges are explicit by default in bzr.

Below is what the history will look like over time while following this pattern in bzr.

Git-style development

In git, the usual development process looks something like this:
  1. Branch.
  2. Hack, hack, hack. Commit frequently.
  3. Rewrite the branch history, breaking it into a small number of fully-functional logical chunks.
  4. Submit a series of patches for review.
  5. Hack, hack, hack. Fix review issues.
  6. Rewrite the branch history, breaking it into a small number of fully-functional logical chunks.
  7. Rebase on top of trunk to resolve conflicts.
  8. Submit a series of patches for review.
  9. Fast-forward trunk to the submitted branch head.
Merges are implicit in git, when possible, which it calls a "fast-forward" merge. This means that no merge will be recorded unless there was parallel history, and when there are N parallel branches, N-1 merges will be recorded. An exception to this is the case where the changes are rebased before merging, in which case no merges will be recorded at all. History will then simply flow forward in a straight line.

Below is what the history will look like over time while following this pattern in bzr.
Every project starts with a single point in history:

The graph has just one point, and it is called Revision 1.
Just like the bzr style, git-style history starts with a single point:

The main difference is that git doesn't use revision numbers. I show the number here anyway because it's much easier to refer to than a checksum.
The normal way to make a change in bzr is to branch from trunk, hack on your changes for a while, and then merge back into trunk:

In this example, the developer has implemented feature "a" and it took 5 revisions. When they merge, trunk goes from Revision 1 to Revision 2, and the changes in the feature branch are assigned longer numbers based on their branch point (r1), their merge order (first merged), and a simple counter (1 to 5).

The commands to do this are:
  bzr branch trunk feature-a
  cd feature-a
  hack ; bzr commit ; hack ; bzr commit ; ...
  cd ../trunk
  bzr merge ../feature-a ; bzr commit 
The natural thing to do with a branch is to develop it. Just like bzr, the developer branches from trunk, hacks on it for a while, and then merges back into trunk. Here is the resulting graph:

But this looks different than bzr! In git, merges aren't normally explicit like in bzr. Instead, it does "fast-forward" merges by default whenever possible. So, instead of moving from Revision 1 to Revision 2, trunk's head moves from Revision 1 all the way to Revision 6 in one leap, without ever having had anything in-between.

The commands to do this fast-forward merge in bzr are:
  bzr branch trunk feature-a
  cd feature-a
  hack ; bzr commit ; hack ; bzr commit ; ...
  bzr push ../trunk
The difference here is that the developer pushes to trunk instead of merging from the feature branch to trunk.
Since we're in a distributed development environment, someone else was adding a feature at the same time. Just like the author of feature "a", the author of feature "b" branched, hacked, and then merged back into trunk. This time, it took three revisions:

After merging, trunk goes from Revision 2 to Revision 3. The individual changes on the branch are assigned numbers as before, except this time it's 1.2.* instead of 1.1.* because it was merged second. The development of feature "a" and feature "b" may have happened simultaneously or a-first or b-first; it doesn't matter. The order of merges determines the order the branch details are displayed. Merging them into trunk in a different order would change the graph, but developing them in a different order would not make any difference visually.

The commands to do this are the same as last time, except with feature-b instead of feature-a.
Just like last time, another developer has been working on another feature at the same time. They branch, hack, and then send changes back to trunk using git's default merge algorithm. This is the resulting graph:

The graph may look a little different:
  • There have been two branches and two merges so far, but it appears as if there was only one.
  • If you're looking closely you might notice that trunk's head just moved from Revision 6 to Revision 5. That's right, adding a change made the latest revision number go down, not up.
  • The most recent changes appear at the bottom, with the older changes at the top. They don't flow in a consistent order. This is a side effect of merging in the wrong direction when showing leaf nodes in merge order.
Why is this so different? Because this was a "fast-forward" merge instead of an explicit merge. Fast-forward merges tend to hijack the mainline, which results in a complete re-numbering of top-level revisions.

The commands to do this in bzr are:
  bzr branch trunk feature-b
  cd feature-b
  hack ; bzr commit ; hack ; bzr commit ; ...
  bzr merge ../trunk ; bzr commit
  bzr push ../trunk
It's worth mentioning that, in a more git-like style, the developer might choose to simply rebase branch b on top of trunk instead of merging, and then the history would show no merges at all. It would flow as a single line of points as in the previous step, with serialized revisions for two branches instead of just one.
A third developer was working at the same time, on feature "c". Or it could be that the first developer made two branches in parallel, it doesn't matter. After branching, hacking, and merging, this is the resulting graph:

Unsurprisingly, trunk moves from Revision 3 to Revision 4, and the merged branch changes get assigned longer identifiers of 1.3.*. The graph expands horizontally by one more level but all the actual changes appear at the top.

Note that this graph contains four columns. From left to right, they are: the mainline, branch C, branch B, branch A.

The commands to do this are again the same as last time, except with feature-c instead of feature-b.
When a third developer works on this project simultaneously, things are pretty much the same as in bzr. They branch, hack, and then merge ... except the merge uses git's default algorithm:

This graph doesn't look much like either of the previous two. Several things are odd again:
  • Trunk's head went from Revision 5 to Revision 3, going backward again. And not even by a consistent amount -- the trunk head went from 1 to 6 to 5 to 3 since it was created.
  • The visual structure of the graph has been completely rearranged since last time.
  • Again, the newest changes appear at the bottom, with the oldest changes at the top. The revisions aren't remotely in chronological order, as they go r1, r1.2.*, r1.1.*, r2, then r3. The branches chronologically zig-zag downward while the mainline trunk revisions count upward.
  • It still shows one branch fewer than actually happened.
  • The merges tend to float to the top while the actual development changes sink to the bottom.
Note that this graph contains only three columns. From left to right: branch C, branch A, branch B. There is no separate path for the mainline.

The commands for this are essentially the same as for feature "b".

Usage tips

The bzr community has produced some best practices for development, partly based on the merge patterns above.
  • Don't merge trunk into your branch and then push over trunk. This results in a fast-forward merge, which causes problems. It hijacks trunk, reshapes the history graph, renumbers revisions, and interferes with bisect tools.
  • Make sure merges to trunk are just merges, with no additional changes.
  • If your branch doesn't merge cleanly with trunk, be sure to use the double-merge pattern explained below. Merge from trunk, fix the conflicts, and then merge back to trunk.
  • Never push to trunk unless you are the owner and you are pushing a locally-updated copy of trunk.
  • If you combine the above tips, all merges to trunk will end up being trivial... so why not let a bot do it? Examples are Tarmac, PQM, etc. This also means that the development patterns will actually be enforced.
For large projects with many developers, the above tips can be applied recursively, using multiple nested levels of "trunk". Each mini-trunk should use the same disciplines as the main trunk.

Double-merge pattern

As a side note, a common issue comes up a lot in DVCS tools. This problem is... what if trunk changed significantly since I branched? What if my branch can't merge cleanly back into the current version of trunk?

The solution in bzr is to do a double merge. This means merging from trunk, fixing conflicts, and then merging immediately back to trunk again. The result looks like this:

The way git handles this is to instead rebase the branch on top of the current trunk. This makes it appear as if the branch and the development happened much later in time than it actually did.


One other thing to note about the two merge styles presented is that they affect how well bisect tools work. A bisect tool digs through a project's history in a reasonably efficient manner (binary search, hence the name bisect) to find the exact change which introduced a bug. It is one of a developer's most powerful tools for determining the cause of a bug so the bug can be fixed.
Using the bzr style, the idea is that every revision on the mainline is "cooked". This means it has been reviewed and passes whatever base level of QA the project uses. This could be a simple build bot, a suite of automated tests, or even a full manual test suite. The non-mainline revisions are assumed to be "raw" development versions which haven't had much direct attention, and are thus unlikely to be fully functional.

This means the bisect tool only needs to care about the mainline revisions, at least until it narrows down the problem to a specific branch. Then it dives into the branch to find the exact commit.

The overall result here is that developers can submit their original development branch upstream without additional effort, and the project's revision history will show the actual process the developer went through to create the branch. This includes original commit messages, timestamps, mistakes and their fixes, changes made after review, etc. One can read the history to see the overall progression of each new feature, from the first draft all the way to the finished product. It tells the development story in a very human manner.
The git development style is very different in this respect. Since there is no mainline, all paths through history are considered equally valid. The bisect tool has no way to determine which revisions are "cooked" or "raw", and the "raw" versions are often broken in ways that make them unusable for bisection purposes.

This presents a problem. The bisect tool doesn't know which revisions are suitable for bisection.

So, how to solve this problem? Simple. Ensure that all revisions in the entire history graph are bisectable. What this means is that developers must rewrite their changes before submitting upstream, ensuring that each and every intermediate change (not just the end result) passes the project's QA standards.

The overall result here is that the project history tends to be very clean, with absolutely no extra or unnecessary revisions. It looks as if each change was made perfectly, intentionally, and quickly, on the first attempt. The changes appear to emerge fully-formed with no evidence of the process which created them.
Last modified: April 23, 2012 @ 11:38 MDT
Copyright (C) 1996-2024 Selene ToyKeeper