Take for example
> git show 9c1cb268c2895fca1a8036c82f5448816cb95510
commit 9c1cb268c2895fca1a8036c82f5448816cb95510
Author: Foo Bar <foo@bar.com>
Date: Thu Jul 21 17:10:24 2022 -0400
Renamed files.
diff --git a/baz.py b/bag.py
similarity index 100%
rename from baz.py
rename to bag.py
diff --git a/run_baz.bat b/run_bag.bat
similarity index 80%
rename from run_baz.bat
rename to run_bag.bat
index e660f46..e06d1f0 100644
--- a/run_baz.bat
+++ b/run_bag.bat
@@ -17,6 +17,6 @@
[...]
I would like to extract the information about renaming and similarity index in a structured and reliable way (e.g. robust to filenames with spaces in them).
The output above appears to be generated by this code.
What libgit2
function or other computer-friendly interface can I use to access this? I understand that this information is not directly represented in the git repository (which in essence only stores snapshots of the files), but instead generated as needed for human consumption.
It seems that git-diff-tree is what you're looking for:
$ git log --stat --summary -1 HEAD
commit 9de8e454e8ccabf332ea249cbe56185f1c6ba465
Author: Test <test@example.com>
Date: Fri Dec 16 22:16:02 2022 -0800
b
a => b | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename a => b (100%)
$ git ls-tree HEAD
100644 blob d6459e005434a49a66a3ddec92279a86160ad71f b
$ git ls-tree HEAD^
100644 blob d6459e005434a49a66a3ddec92279a86160ad71f a
$ git diff-tree --find-copies-harder -r -M HEAD^ HEAD
:100644 100644 d6459e005434a49a66a3ddec92279a86160ad71f d6459e005434a49a66a3ddec92279a86160ad71f R100 a b
d6459e005434a49a66a3ddec92279a86160ad71f
is the blob ID of the file b
, which has been renamed from from a
. R100
tells you that it has been renamed with 100 similarity.
An example from VLC:
$ git log --stat --summary -1 --find-copies-harder 126fb1184cbc3e9716a903655f71fe45b85d2fb6
commit 126fb1184cbc3e9716a903655f71fe45b85d2fb6
Author: Felix Paul Kühne <fkuehne@videolan.org>
Date: Sun Jun 10 11:36:51 2018 +0200
macosx: split windows file to have one class per file
extras/package/macosx/VLC.xcodeproj/project.pbxproj | 30 +++++++++++++++++------
modules/gui/macosx/Makefile.am | 3 ++-
modules/gui/macosx/VLCFSPanelController.h | 2 +-
modules/gui/macosx/VLCMainWindow.h | 2 +-
modules/gui/macosx/{Windows.h => VLCVideoWindowCommon.h} | 26 ++------------------
modules/gui/macosx/{Windows.m => VLCVideoWindowCommon.m} | 174 +++-------------------------------------------------------------------------------------------------------------------------------
modules/gui/macosx/{VLCApplication.h => VLCWindow.h} | 28 +++++++++++++++------
modules/gui/macosx/VLCWindow.m | 195 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
po/POTFILES.in | 6 +++--
9 files changed, 252 insertions(+), 214 deletions(-)
rename modules/gui/macosx/{Windows.h => VLCVideoWindowCommon.h} (80%)
rename modules/gui/macosx/{Windows.m => VLCVideoWindowCommon.m} (85%)
copy modules/gui/macosx/{VLCApplication.h => VLCWindow.h} (65%)
create mode 100644 modules/gui/macosx/VLCWindow.m
$ $ git diff-tree --find-copies-harder -r 126fb1184cbc3e9716a903655f71fe45b85d2fb6^ 126fb1184cbc3e9716a903655f71fe45b85d2fb6
:100644 100644 fe0ee17e5f995686fa5c1ec342b6f4048774b17a 595a93fb5264c1c21c570fa1b8dfb26adfe42fba M extras/package/macosx/VLC.xcodeproj/project.pbxproj
:100644 100644 ace16e1aca6e96a2558f238ec4196a5434c538fc e10209ff496ad3b63ddda86dc6f8d5a394d58e74 M modules/gui/macosx/Makefile.am
:100644 100644 5b86759b0bdf0cafd6770c018432577d966f426b 78366fd14655727bfde06f0fca60c95da38c7462 M modules/gui/macosx/VLCFSPanelController.h
:100644 100644 ae7fce903d8514938ed9c7757f69d6fdaff66f15 b9bd614444e372027db0ba2ada92f79439ef1cfa M modules/gui/macosx/VLCMainWindow.h
:100644 100644 07e698ecff1bc6397e48de836f3a346dd22c27c6 8d18a3b6e40d7f119306cf225c4f89fd80b31dfa R080 modules/gui/macosx/Windows.h modules/gui/macosx/VLCVideoWindowCommon.h
:100644 100644 300c99d108fddea5c442a110c5f2af62129a955d 6a7e47ad16c057dd9aa5e62a5c354252f72a5b50 R085 modules/gui/macosx/Windows.m modules/gui/macosx/VLCVideoWindowCommon.m
:100644 100644 704ed7e87b797a86fdc3c64bbf098f65f505dfb8 551e161c65a203d70bba75d83183afb2c0384985 C065 modules/gui/macosx/VLCApplication.h modules/gui/macosx/VLCWindow.h
:000000 100644 0000000000000000000000000000000000000000 a92418f1667b626ce68c8ac75a90812bfe5e6f4b A modules/gui/macosx/VLCWindow.m
:100644 100644 96b9f6761755a35a97f2bb9848ae25ec15b625ff d8537d7e7b8338ade96b998a293d6176e994ab1a M po/POTFILES.in
Here, R085
means a rename with similarity index 85%, and C065
means a copy with similarity index 65%.
Thankfully, git also escapes problem characters:
$ dir
a.txt b.txt c\ c.txt d\td.txt e\ne.txt
$ printf "%15s\n" * | hexdump -C
00000000 20 20 20 20 20 20 20 20 20 20 61 2e 74 78 74 0a | a.txt.|
00000010 20 20 20 20 20 20 20 20 20 20 62 2e 74 78 74 0a | b.txt.|
00000020 20 20 20 20 20 20 20 20 63 20 63 2e 74 78 74 0a | c c.txt.|
00000030 20 20 20 20 20 20 20 20 64 09 64 2e 74 78 74 0a | d.d.txt.|
00000040 20 20 20 20 20 20 20 20 65 0a 65 2e 74 78 74 0a | e.e.txt.|
$ git diff-tree --find-copies-harder -r -M HEAD^ HEAD
:100644 100644 f57a61465c88a387f1f9f2c6d8ef2fc6664da7ea f57a61465c88a387f1f9f2c6d8ef2fc6664da7ea R100 a.txt "a\ta.txt"
:100644 100644 ae6ead370af774b6c9341805dd8d40a139fb8ac6 68005d4639648fcef9afc82108baa5979f5bcabf C089 b.txt "b\nb.txt"
:100644 100644 9f429e1ad06183294f3b2c586481f919fe4d1875 80eb29a464aba1d866b8248843aaa0db11bd98e2 R079 "d\td.txt" d.txt
:100644 100644 8aedf86ace0da13975de33a62e27f092a254b045 8aedf86ace0da13975de33a62e27f092a254b045 C100 "e\ne.txt" "e/e\te.txt"
In the above examples, filenames have tabs and newlines in them, so those need to be unescaped.