we plan to migrate one of our last big CVS repositories in a Git repository.
For migration we are using svn2git's cvs2git. Because this CVS repository has grown over ~ 12 years, it has 31GB of data.
I couldn't find any solution to drop all history older than a specified period of time (2 years for example).
Do you know any tool/command/resolution for one of this?:
Thanks and greetings, Andreas
Solution as suggested by Dmitry Oksenchuk: After editing grafts, I wrote a BASH script tp clean up messed up tags and branches:
#!/bin/bash
NEW_ROOT_REF=$1
git tag --contains $NEW_ROOT_REF | sort > TAGS_TO_KEEP.tmp
echo "Keep Tags:"
cat TAGS_TO_KEEP.tmp | wc -w
git branch --contains $NEW_ROOT_REF | sort > BRANCHES_TO_KEEP.tmp
echo "Keep Branches:"
cat BRANCHES_TO_KEEP.tmp | wc -w
git tag -l | sort > TAGS_ALL.tmp
echo "All Tags:"
cat TAGS_ALL.tmp | wc -w
git branch -l | sort > BRANCHES_ALL.tmp
echo "All Branchess:"
cat BRANCHES_ALL.tmp | wc -w
# Remove tags
COUNTER=0
for drop in `comm TAGS_ALL.tmp TAGS_TO_KEEP.tmp -23`; do
git tag -d $drop
COUNTER=$[$COUNTER +1]
done
echo "Dropped tags: $COUNTER"
# Remove branches
COUNTER=0
for drop in `comm BRANCHES_ALL.tmp BRANCHES_TO_KEEP.tmp -23`; do
git branch -D $drop
COUNTER=$[$COUNTER +1]
done
echo "Dropped branches: $COUNTER"
# Clean up
rm TAGS_ALL.tmp TAGS_TO_KEEP.tmp BRANCHES_ALL.tmp BRANCHES_TO_KEEP.tmp
In a well-formed Git repo depth of the history is usually not an issue. In linux repo there are more than 500k commits and it works fine. This year we migrated a ~15 years old CVS repo (5GB of ,v
files) to Git. The Git repo takes ~200MB and contains ~70k commits.
We faced two major problems: binary files and the number of tags.
Binary files
In CVS all the revisions of binary files are stored on the server and only the current revision is trasferred on checkout. So it's not a problem at all to store binary files in CVS, you just need enough disk space on the server. With Git the situation is different. When you make a clone of a Git repo, all the revisions of binary files are transferred to your local clone. Even if a file doesn't exists in the most recent commit, all its historical revisions are in your local repo. We managed to shrink the size of Git repo from ~700MB to ~200MB by removing not necessary binary files from the history. The important point here is to base your decision on size of a file in Git, not in CVS. Git packs objects using zlib compression and delta compression, so the history of the same file can take totally different disk space in Git and in CVS. You can use the "Find large files" plugin in Git Extensions.
Number of tags
We have more than 20k build tags in CVS repo. With such number of tags both Git Extensions and Source Tree work extremly slow (especially when they need to load all the tags into a drop-down list). git push
with Git 1.9.5 was also very slow because of performace regression fixed in Git 2.3.0. Currently in Git we keep only build tags for recent 2 years (~7k tags) periodically archiving older tags.
Dropping old history
If you still need it, it's much easier and safer to drop old history in Git than in CVS or during migration.
grafts
file: echo %commit_hash% >.git/info/grafts
git tag --contains
and git branch --contains
)git filter-branch --tag-name-filter cat -- --all
Or, you can also parse the git-dump.dat
file (output of cvs2git in git fast-import format) and remove old commits, tags, and branches from there.