I have a 300 MB Git repository. The total size of my currently checked-out files is 2 MB, and the total size of the rest of the Git repository is 298 MB. This is basically a code-only repository that should not be more than a few MB.
I suspect someone accidentally committed some large files (video, images, etc.), and then removed them... but not from Git, so the history still contains useless large files. How can find the large files in the Git history? There are more than 400 commits, so going one-by-one is not practical.
Note: my question is not about how to remove the file, but how to find it in the first place.
I've found this script very useful in the past for finding large (and non-obvious) objects in a Git repository:
#!/bin/bash
#set -x
# Shows you the largest objects in your repository's pack file.
# Written for OS X.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs
# Set the internal field separator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';
# List all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`
echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."
output="size,pack,SHA,location"
allObjects=`git rev-list --all --objects`
for y in $objects
do
# Extract the size in bytes
size=$((`echo $y | cut -f 5 -d ' '`/1024))
# Extract the compressed size in bytes
compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
# Extract the SHA-1 hash value
sha=`echo $y | cut -f 1 -d ' '`
# Find the objects location in the repository tree
other=`echo "${allObjects}" | grep $sha`
#lineBreak=`echo -e "\n"`
output="${output}\n${size},${compressedSize},${other}"
done
echo -e $output | column -t -s ', '
That will give you the object name (SHA1sum) of the blob, and then you can use a script like this one:
... to find the commit that points to each of those blobs.