gitsvnversion-controlversion-control-migration

Migrate SVN to Git with filtered history


I would like to migrate a project from SVN to Git and retain history.

But because there are some configuration files with production passwords and other sensitive data in the SVN repository, I would like to exclude those few files from the migrated history.

How would I go about doing this?


Solution

  • The easiest solution would be to migrate your SVN repository to Git on your local machine and then remove the files that contain the sensitive data before you push the migrated history to a remote repository.

    For example:

    # Migrate the SVN project into a local repo
    git svn clone svn://server/svnroot \
        --authors-file=authors.txt \
        --no-metadata \
        -s your_project
    
    cd your_project   
    
    # Remove the 'passwd.txt' file from the history of the local repo
    git filter-branch --force --index-filter \
        'git rm --cached --ignore-unmatch passwd.txt' \
        --prune-empty --tag-name-filter cat -- --all
    

    As long as you don't push the local Git repository to a remote location, you can safely remove any file from the entire history using git filter-branch. After the files are removed, it's safe to publish the repo anywhere you want.

    An alternative solution to git filter-branch is to use a tool called BFG Repo-Cleaner, which uses its own -supposedly faster- implementation to remove a file from the history of a Git repository. With 10.000 commits it might be worth considering, since the performance of git filter-branch is going to be at least linear to the number of commits to process.