I've performed a git filter-branch --index-filter 'git rm --cached --ignore-unmatched badfiles/ badfiles2/' --prune-empty
(per here) to remove a bunch of files in preparation for moving the remaining files to another repository. --prune-empty
gets rid of any resulting empty-commits, but it doesn't act on merges, which makes sense.
Now the history for this particular repo looks pretty ugly with a bunch of merges that don't actually add anything and some merges that are just merges of other merges that didn't actually add any changes (in the rewritten history; they may have been 'useful' before the filter-branch).
Consider this annotated snippet (generated with git log --graph --oneline --shortstat
):
* 575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\
| * 5dbc3f1 Actual feature changes
| | 2 files changed, 2 insertions(+), 2 deletions(-)
| * 35abc98 Cleanup/prep
|/
| 2 files changed, 22 insertions(+), 16 deletions(-)
* c3b3d86 Merge pull request #46 from org/topic_branch-mods # USELESS-C
|\
* \ 892de05 Merge pull request #47 from org/topic_branch # USELESS-B
|\ \
| |/
|/|
| * e738d4b Merge branch 'master' into topic_branch # USELESS-A
| |\
| |/
|/|
* | 4182dac CommitMsg #40 #SQUASHED-PR
| | 2 files changed, 15 insertions(+), 6 deletions(-)
* | 3b42762 CommitMsg
|/
| 2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
| 2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
4 files changed, 241 insertions(+)
I'd like to shorten this to (obviously with different id's as appropriate):
* 575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\
| * 5dbc3f1 Actual feature changes
| | 2 files changed, 2 insertions(+), 2 deletions(-)
| * 35abc98 Cleanup/prep
|/
| 2 files changed, 22 insertions(+), 16 deletions(-)
* 4182dac CommitMsg #40 #SQUASHED-PR
| 2 files changed, 15 insertions(+), 6 deletions(-)
* 3b42762 CommitMsg
| 2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
| 2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
4 files changed, 241 insertions(+)
So I'd like to get rid of the 'USELESS' merges, which are all 'empty' merges (no merge changes), but I'd like to preserve the history/grouping associated with the also-'empty' KEEP merge at the top, which groups those commits together into one 'changeset'.
Or looking at another example in the traditional simplified-sideways-history:
A -- B -- C -- D ==> A -- B --- D'
\----\--/ / \-E-/
\----E
I have tried solutions to remove 'empty' merges (like this), but those remove all empty merges, and I want to keep the 'useful' empty merges as displayed in the examples...
As far as I can tell, the 'useless' empty merges don't contain any commits that aren't all the way to the left/top in the history. Is there a way to filter those out cleanly? I guess I don't really even know how to describe/define those...
Note that the given example was intentionally simple. For what it's worth, later in the history this repo looks like this, all of which I'd like to prune:
* 3d37e42 Merge pull request #239 from jim/topic-dev
|\
| * 05eaf9e Merge pull request #7 from org/master
| |\
| |/
|/|
* | 1576482 Merge pull request #193 from john/master
|\ \
| * \ 187100e Merge branch 'master' of github.com:org/repo into master
| |\ \
| * \ \ 067cc55 Merge branch 'master' of github.com:org/repo into master
| |\ \ \
| * \ \ \ a69e3d2 Merge branch 'master' of github.com:org/repo into master
| |\ \ \ \
| | |/ / /
* | | | | 0ce6813 Merge pull request #212 from jim/feature
|\ \ \ \ \
| | |_|_|/
| |/| | |
| * | | | 0f5352e Merge pull request #5 from org/master
| |\ \ \ \
| |/ / / /
OK, I don't think this is perfect, but it does solve the problem in this particular case; there are cases where it doesn't quite clean up as much as it perhaps could, but it's a step if anyone is interested:
git filter-branch --commit-filter '
if ! git rev-parse --verify "$GIT_COMMIT^2" 1>/dev/null 2>&1 ||
[ "$(git log --no-merges "$GIT_COMMIT^2" "^$GIT_COMMIT^1" --oneline | wc -l)" -gt 0 ];
then
#echo take $GIT_COMMIT >&2
# Pick one:
git_commit_non_empty_tree "$@" # Drop empty commits
#git commit-tree "$@" # Keep empty commits
else
#echo "breakup $GIT_COMMIT ($*)" >&2
skip_commit "$1" "$2" "$3" # (quietly) only keep the first parent
fi' -f HEAD
If 1) the commit doesn't have a second parent (git rev-parse
returns an error if the referenced commit ($GIT_COMMIT^2
) doesn't exist) OR 2) the second parent ($GIT_COMMIT^2
) contains commits that the first parent ($GIT_COMMIT^1
) does not (see here), the commit is kept (if it is not-empty; use git commit-tree
if you want to keep empties); if the second parent exists and doesn't add anything useful, we skip the commit, and intentionally only pass the first parent-I'm not sure this is 'legit', but it drops the second parent from the history, and it worked in my case... (see caveats below)
From the bottom-up:
* 575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\
| * 5dbc3f1 Actual feature changes
| | 2 files changed, 2 insertions(+), 2 deletions(-)
| * 35abc98 Cleanup/prep
|/
| 2 files changed, 22 insertions(+), 16 deletions(-)
* c3b3d86 Merge pull request #46 from org/topic_branch-mods # USELESS-C
|\
* \ 892de05 Merge pull request #47 from org/topic_branch # USELESS-B
|\ \
| |/
|/|
| * e738d4b Merge branch 'master' into topic_branch # USELESS-A
| |\
| |/
|/|
* | 4182dac CommitMsg #40 #SQUASHED-PR
| | 2 files changed, 15 insertions(+), 6 deletions(-)
* | 3b42762 CommitMsg
|/
| 2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
| 2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
4 files changed, 241 insertions(+)
It kept everything through SQUASHED-PR
(note that commit id 4182dac
and parents are retained as their history didn't change). It decided USELESS-A
should stick around b/c it's second parent (4182dac
) contains commits its first parent (c4e62ba
) did not contain, but then it looked at USELESS-B
, whose second parent (including USELESS-A
) doesn't add anything useful, so it dropped it (again, including USELESS-A
). Then USELESS-C
was just useless, so it got dropped, and KEEP
had 'something useful' in the second parent, so it was retained. So you end with:
* 63b4d39 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\
| * 9a5570d Actual feature changes
| | 2 files changed, 2 insertions(+), 2 deletions(-)
| * a251317 Cleanup/prep
|/
| 2 files changed, 22 insertions(+), 16 deletions(-)
* 4182dac CommitMsg #40 #SQUASHED-PR
| 2 files changed, 15 insertions(+), 6 deletions(-)
* 3b42762 CommitMsg
| 2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
| 2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
4 files changed, 241 insertions(+)
"$1" "$2" "$3"
in this case leaving off "$4" "$5"
, which would otherwise be included in "$@"
. If you have multiple parents (or rather if your commit has multiple parents), you'll have to adjust this to account for that; shouldn't be too hard, but I'm not fixing it right now for a hypothetical - you may want to choose specific parents to drop, idk.USELESS-A
before it got merged to USELESS-B
(which arguably wouldn't be useless then), USELESS-A
will not get pruned/dropped, so you'll still have some ugliness perhaps.