I work in a large software team doing large monthly releases. We work on a branch to release model (see diagram).
This model solves a lot of problems, but has some risks to manage. When I go to release branch 1.1 to production, I need to check that all the commits in 1.0 are in 1.1.
I can do this with the following command:
git log --cherry-pick --right-only --pretty="%h %ce %B" --no-merges release/1.1/master...release/1.0/master > missingcommits.log
Then I go an email this list to each of the developers concerned, and ask them to just do a careful second check.
This works fairly well, but I'm concerned about it picking up some false positives.
Now of course if you have checked in the same code in two different branches with two different commits, then this will fall afoul of this scan.
In theory, if you have cherry-picked your commit from 1.0 to 1.1 - then it should not show up in this scan (ie the same commit is in both branches).
Now my code works fine. Ie code in one branch, I cherry pick across, and then it doesn't show up in this scan. So I think it should work.
When I send the email out to the developer with just their 'missing commits in the new release branch', what I get is some of the developers coming back to me and saying:
No I definitely did a cherry pick to move my code over.
Now this could be
(a) defensive behaviour, or
(b) a cherry-pick gone wrong,
(c) my misunderstanding of git, or
(d) a genuine problem with this process.
My question is: Will git log --cherry-pick --right-only --no-merges ignore all commits correctly cherry-picked between branches?
There are two possible "errors" of sorts:
There is a single underlying mechanism that means that both can occur. However, the especially dangerous one—a commit not being listed when it should be, i.e., the false negative—is pretty rare.
To understand what's going on, start by reading through the git patch-id
documentation. Note that every commit has its own unique hash ID, but given two different commits with a sufficiently similar git diff
or git show
output, comparing each of the two commits to its parent(s), those two different commits will have the same patch-ID. The patch-ID, in other words, is an attempt to identify a cherry-picked commit.
This attempt is imperfect. As the documentation says, the patch-ID is essentially a checksum of the diff with the line numbers and any whitespace in the diff body proper stripped. If someone cherry-picks a commit and it goes in smoothly, the new commit and the original will definitely have the same patch-ID. But if there is some problem—if the commit requires some manual conflict resolution—the new commit will probably have a different patch-ID. That gives you the false positive: a bit of a waste of time when someone has to check that the commit is there, it just changed form a bit.
The false negative means that a commit that changed something important—maybe the placement of a closing block—gets missed. This happens when, e.g., diff #1 says "move a close brace from line A to line B" and diff #2 says "move a different close brace from line C to line D". The patch-ID strips the line numbers, and the close brace is just a close brace. Indentation might change between the two diffs as well, but that too is ignored. So the fact is that loop #1 in diff #1 is affected, and entirely-different loop #2 in diff #2 is affected. The two patches should be considered different, from a semantic analysis point of view—but git patch-id
, which does no such analysis, considers them the same.
These can occur. If you have a good test suite, it's likely that any such false-negatives resulting in a missed cherry-pick will fail in testing, but that's definitely not something Git will guarantee.