I have been tasked with migrating our Team Foundation Server (TFS) repositories into the agency GitHub Enterprise (GHE) and keeping the entire changelog intact. I am using the git-tfs tool with the following syntax to create a local copy of the primary source branch:
git tfs clone --all --with-labels <server>:8080/tfs/ $/<branch>
The process takes about 30 hours and when that completes I have a directory structure of ~45 GB that contains a ~6 GB .git repository sub-structure. When I attempt to push this to our agency GHE I get errors regarding large files, because the agency doesn't have Large File Storage enabled and has no plans to enable it.
I have brought this to the attention of my superiors and been instructed to "remove the large files and make the upload." I ran an audit of all files >20 MB as instructed and have a spreadsheet I can copy/paste into Notepad++ for scripting the removal process.
I have attempted a git rm
and then a git commit -m
on the larger files, but am learning that this doesn't work as the changelog still tracks the large files. The git push
to GHE command simply threw back the same errors I was seeing before.
My research has led me to several solutions, such as BFG Repo-cleaner and git filter-repo. Both tools require a --mirror copy of the repository, which git-tfs doesn't support. Git-tfs only supports a --bare option and the documentation for git clone doesn't help me understand the difference. I understand that both are just the repository directory and not the raw file structure, but not much more. I also do not understand how to push a mirrored local copy that doesn't have a file structure into GHE.
I've raised these issues to my leadership and been instructed to:
git-tfs clone
TFS to localgit clone --mirror
the local copy to a secondary local copyI'm unclear on several things.
You shouldn't need a mirror to use git-filter-repo
as it can work on an existing repo, and git-tfs
should have left you with a working Git repo. If you can, I would just back up the entire 45(6) GB repo and then you can wipe your hands of the TFS portion. You now have a Git repo that you can play around with and if things go badly you can simply delete it and restore it from the backup.
Once it's backed up, I would try using git-filter-repo
to remove the large files. Even if you don't have a fresh clone you can use the --force
option. There is also an option for removing files larger than a certain size, and in your case you might use: --strip-blobs-bigger-than 20M
. Note that git-filter-repo
is much faster than other options (and also git-tfs
), so it's pretty common to do multiple passes. For example, you could first strip out all the large files, and then you might do another pass to remove some passwords or other undesirable changes (or entire commits).
For your specific questions, the fact that you don't actually need a mirror makes your first 2 questions irrelevant. Once you have your repo the way you want it, then you just push it out into whatever Git host you'd like, such as GitHub Enterprise. For your third question:
How do I perform an audit of the changelog to see what was modified to ensure that history is preserved?
The way I've done this in the past is with the following checks: