I want to programmatically list the name and last modification time of every file in a certain revision.
Running git log
for every file, as suggested here is very slow.
Is there a faster way to accomplish this?
Running the script below on a non-trivial repo (SDL) takes 59s on my machine.
#!/usr/bin/env python
import datetime
import subprocess
import time
commit = "HEAD"
start = time.time()
file_names = subprocess.check_output(["git", "ls-tree", "--name-only", "-r", commit], text=True).strip().split("\n")
print(f"[{time.time() - start:.4f}] git ls-tree finished")
file_times = list(datetime.datetime.fromisoformat(subprocess.check_output(["git", "log", "-1", "--pretty=format:%cI", commit, "--", name], text=True).strip()) for name in file_names)
print(f"[{time.time() - start:.4f}] git info finished")
The basic idea is to postprocess git log --name-status
with whatever per-commit info you want and look for the first occurrence of names you're interested in. The all-of-them version:
git log --name-status --pretty=%ci | awk -F$'\t' '
NF==1 { stamp=$0; next }
!seen[$2]++ { print stamp,$0 }
' | sort -t$'\t' -k2,2
and as always season to taste. Are you running on spinning rust? I do that on the SDL default checkout with a cheap ssd it takes 0.548s, so more than a hundred times faster. But then, it's doing 1500+ times fewer walks through history so there's that.