The scenario is the following:
I have a big CVS repository that I want to convert to 14 distinct git repositories.
The cvs2git
part of the process is fine and leads to a big repository repo.git.
For each of the 14 git repo, I clone the main repo and I run the following command :
git filter-branch -d /tmp/rep --tag-name-filter cat --prune-empty --subdirectory-filter "sub/directory" -- --all
However, prior to this command, I have to perform another git filter-branch
command for some git repositories because I have to rewrite the commits to move a file from a directory to another. The --tree-filter
is the option I use. Here is a example of the command line executed:
script_tree_filter="if test -f rep/to/my/file && test -d another/rep ; then echo Moving my file ; mv rep/to/my/file another/rep; fi"
git filter-branch -d /tmp/rep --tag-name-filter cat --prune-empty --tree-filter '$script_tree_filter' -- --all
At the end of the process (14500 commits: it takes about 1 hour !) I clean the refs and use git gc
:
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --prune=now
At the end I obtain a repository with a size of 1.2Go (which is still obviously too big), and by looking at the commits, I can see that a lot of old ones are still present. They concern file and directories which should not be here anymore after the --subdirectory-filter
command.
In the history of the commits, there is a discontinuity between the unwanted commits and the good ones as seen in gitk --all
:
I am pretty certain that those commits are still present because of the tags on some on them. If this is the case, is it possible to remove those tags without removing the one on the good commits ?
If the tags are not the reason, any idea ?
For more information, the content of the refs
directory (in the git repository obtained by subdirectory-filter) is empty:
$ ls -R refs/
refs/:
heads original tags
refs/heads:
refs/original:
refs
refs/original/refs:
heads tags
refs/original/refs/heads:
refs/original/refs/tags:
refs/tags:
I've found that the branches and tags are listed in the file packed-refs
in the git repository:
d0c675d8f198ce08bb68f368b6ca83b5fea70a2b refs/tags/v03-rev-04
95c3f91a4e92e9bd11573ff4bb8ed4b61448d8f7 refs/tags/v03-rev-05
There are 817 tags and 219 branches listed in the file.
I managed to solve my problem by changing the way I used cvs2git
: instead of converting the whole CVS base and then use the subdirectory-filter
command, I converted each of the submodules I wanted. In my case, this led to launch 18 different cvs2git
commands:
Before
cvs2git --blobfile=blob --dump=dump /path/to/cvs/base
# Module 1
git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module1" -- --all
# Module 2
git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module2" -- --all
Now
# Module 1
cvs2git --blobfile=blob_module1 --dump=dump_module1 /path/to/cvs/base/path/to/module1
# Module 2
cvs2git --blobfile=blob_module2 --dump=dump_module2 /path/to/cvs/base/path/to/module2
Each repository has now a perfect history.
Why the previous method didn't work ? My guess is that cvs2git
was confused with all the submodules (some of them had their directory name changed during their history).
@Michael @CharlesB Thank you for taking your time to answer and help me.