Our dev process states that every developer should rebase his/her topic branch to master first and then merge it with the --no-ff
flag, to create a merge bubble. This results in a nice and easy to follow history graph.
However, sometimes developers accidentally merge their pull request via the GitHub user interface and not via the Git CLI, which does not rebase before the actual merge, resulting in the following "messy" history graph:
* G
|\
| * F
|/
* E Merge pull request #123 from feature-three
|\
| * D
* | C
|\ \
| |/
|/|
| * B
|/
* A
|\
| * ...
|/
* ...
...
Had the process been followed, we would see this history graph:
* G
|\
| * F
|/
* E' Merge branch 'feature-three'
|\
| * D
|/
* C'
|\
| * B
|/
* A
|\
| * ...
|/
* ...
...
(I am changing commits E
and C
to '
prime as they would have different hashes)
We need to programmatically get a list of such broken merge commits, like E
.
git rev-list --merges --format='%h %p' A..G | grep -v '^commit'
produces the following output:
G E F
E C D
C A B
Where the first column represents the merge commits, the second one is the first parent (also a merge commit) and the third one is the second parent - a commit from the topic branch. However, the parent relation of the broken merge seems to be OK - the command gives the same output when ran on the fixed Git history (second diagram), so we cannot recognize it.
There must be another way to detect the broken merge commits in a project. Please note that a solution which requires git
to be executed on every commit in the history is not preferred as some projects have 50k+ commits, and a single git
execution there takes more than 2 seconds.
Your problem reduces to a simple graph rule:
Every first-parent in master
should be a merge commit. (This is true even for the "bad" graph, but it may be worth checking—if you allow hotfixes to go into master without a merge bubble, any such commits should be identified, and then not subject to the other rule. Note that the root commit presumably won't follow this rule, and there may be some cut-off point where earlier commits don't follow any rules and you should stop looking.)
Every such commit will then have a second-parent. That commit and all of its parents should be simple (single-parent) commits up until they're also reachable from the first parent of the merge that led to this second-parent commit—this traces through the merge bubble itself—and (this is the crucial test) the commit that rejoins with master
should be the other parent of the merge that brought you down this side of the merge bubble.
This second rule identifies commit D
for you: D
's first and only parent is A
, when its parent should have been C
.
So, just read the graph. You can do this directly (e.g., in GitPython or something), or use git rev-list --topo-order --parents master
to read the graph into text and then read that with a program.
As Nitsan Avni suggested in a comment, if you can prevent the accidental merges in the first place, that's the best way to go, because fixing these after the fact is a form of history rewriting, which requires all users to adapt to the rewritten history.