gitmergegit-mergegit-rev-list

How to programmatically detect incorrect (not rebased) merge bubbles in master?


Our dev process states that every developer should rebase his/her topic branch to master first and then merge it with the --no-ff flag, to create a merge bubble. This results in a nice and easy to follow history graph. However, sometimes developers accidentally merge their pull request via the GitHub user interface and not via the Git CLI, which does not rebase before the actual merge, resulting in the following "messy" history graph:

*   G
|\
| * F
|/
*   E Merge pull request #123 from feature-three
|\
| * D
* |   C
|\ \
| |/
|/|
| * B
|/
*   A
|\
| * ...
|/
*   ...
...

Had the process been followed, we would see this history graph:

*   G
|\
| * F
|/
*   E' Merge branch 'feature-three'
|\
| * D
|/
*   C'
|\
| * B
|/
*   A
|\
| * ...
|/
*   ...
...

(I am changing commits E and C to ' prime as they would have different hashes)

We need to programmatically get a list of such broken merge commits, like E.

git rev-list --merges --format='%h %p' A..G | grep -v '^commit' produces the following output:

G E F
E C D
C A B

Where the first column represents the merge commits, the second one is the first parent (also a merge commit) and the third one is the second parent - a commit from the topic branch. However, the parent relation of the broken merge seems to be OK - the command gives the same output when ran on the fixed Git history (second diagram), so we cannot recognize it.

There must be another way to detect the broken merge commits in a project. Please note that a solution which requires git to be executed on every commit in the history is not preferred as some projects have 50k+ commits, and a single git execution there takes more than 2 seconds.


Solution

  • Your problem reduces to a simple graph rule:

    This second rule identifies commit D for you: D's first and only parent is A, when its parent should have been C.

    So, just read the graph. You can do this directly (e.g., in GitPython or something), or use git rev-list --topo-order --parents master to read the graph into text and then read that with a program.

    As Nitsan Avni suggested in a comment, if you can prevent the accidental merges in the first place, that's the best way to go, because fixing these after the fact is a form of history rewriting, which requires all users to adapt to the rewritten history.