gitversion-controlbazaarmigrategit-fast-import

Converting big bzr repository to git, what to expect?


I'm trying to convert some old Bazaar repositories to git, and while everything seem to go through smoothly, I'm a bit unsure if it really went as well as it claimed.

My Bazaar repository is structured like so:

I'm using the fast-export/fast-import method for migrating between bzr and git.

Initially, I migrate the "trunk", with --export-marks, like so:

bzr fast-export --export-marks=../$1/marks.bzr ../$1/trunk | git fast-import --export-marks=../$1/marks.bzr --export-marks=../$1/marks.git

With $1 being the name of the

then iterate all other folders in the "repo" directory and call:

bzr fast-export --marks=../$1/marks.bzr  --git-branch=$nick ../$1/$b/.. | git fast-import --import-marks=../$1/marks.git --export-marks=../$1/marks.git

with $nick being the branch nickname of bzr, and $1/$b being the directory name of the branch.

As I said, it processes all the expected directories, but after completion, when I do a:

git branch

It shows just 20 something branches, where the original Bazaar repository had 80+.

Now, just looking at "master" in git, it seems to be all there, and the missing 60 branches could easily be branches who are already merged into trunk. But I'm not really sure the fast-export/fast-import tools are clever enough to say "bah - you won't need this", but maybe they are.

Does anyone have any experience with this?

Am I only supposed to be left with "master" and any branch who has unmerged commits in them after migrating from bzr to git?

Finally, for the sake of history, is there any way to force all branches to be converted over, even if they are technically defunct?


Solution

  • It seems the fast-import/export tools are indeed clever enough to say "bah - you won't need this". It's not rocket science though, just like git branch -d knows when it's safe to delete a branch, so can git fast-import know that the incoming branch is a replica.

    But probably you'd like to be really sure, and I agree. I put together a simple (if inefficient) script to find the list of unique bzr branches:

    #!/bin/sh
    
    paths=$(bzr branches -R)
    
    for path1 in $paths; do
        merged=
        for path2 in $paths; do
            test $path1 = $path2 && continue
            # is path1 part of path2 ?
            if bzr missing -d $path1 $path2 --mine >/dev/null; then
                # is path2 part of path1 ?
                if bzr missing -d $path1 $path2 --other >/dev/null; then
                    echo "# $path1 == $path2"
                else
                    merged=1
                    break
                fi
            fi
        done
        test "$merged" || echo $path1
    done
    

    Run this inside a Bazaar shared repository. It finds all branches, and then compares all branches against all other. If A is in B, then there are two possibilities: maybe B is also A, which means A == B. Otherwise A is really redundant.

    The script filters out branches that are fully merged into at least one other branch. However, if there are multiple branches that are identical, it prints all of those, with additional lines starting with # to indicate that they are identical.

    Your example commands with the bzr fast-export ... | git fast-import ... seem to have some unnecessary options. Following the examples at the very end of bzr fast-export -h, I recommend to use these steps instead:

    1. Create a brand new Git repo:

      git init /tmp/gitrepo
      
    2. Go inside your Bazaar shared repo:

      cd /path/to/bzr/shared/repo
      
    3. Migrate your main branch (trunk?) to be the master:

      bzr fast-export --export-marks=marks.bzr trunk/ | \
        GIT_DIR=/tmp/gitrepo/.git/ git fast-import --export-marks=marks.git
      
    4. Migrate all branches:

      bzr branches -R | while read path; do
          nick=$(basename $path)
          echo migrating $nick ...
          bzr fast-export --import-marks=marks.bzr -b $nick $path | \
            GIT_DIR=/tmp/gitrepo/.git git fast-import --import-marks=marks.git \
            &>/tmp/migration.log
      done
      

    If you notice the last step does not check for trunk which you already migrated. It doesn't matter, as it won't import it again anyway. Also note that even if branchA is fully merged into branchB, it will be created in Git if it is seen first. If branchB is seen first, then branchA won't be created in Git ("bah - you won't need this").

    I could not find a way to force creating identical branches when importing to Git. I don't think it's possible.