gitgit-cloneshallow-clone

Create proper Git repositories from a Catchall (dirty) one


Let's call my-dirty-repository an existing Git repository containing lots of scripts which are not related. It is a catchall repository which needs to be properly cleaned.

As a Minimal, Complete, and Verifiable example, let's say this repository only contains:

script1.sh
script2.sh

With various commits, which independently updated them, among several branches.

The aim is to create 2 100% independant Git repositories, with ONLY the history of kept files (references).

Let's call them my-clean-repository1 and my-clean-repository2, the first one having only history about script1, and the second having only history about script2.

I tried 3 ways to reach my needs, without success:

I'm pretty sure there is a way to perform it properly.


Solution

  • Edit: I created dedicated tool cloneToCleanGitRepositories to answer this need.

    It is complete version of the old following one.


    @mkasberg thank you for your advices about interactive rebase which is very interesting in some simple history situation.

    I tried it, and it resolves my issue for some of the scripts for which I wanted a clean dedicated, independent, git repository.

    Eventually, it was not enough for most of them, and I tried again another solution with Git filtering system.

    Finally, I wrote this little script:

    #!/bin/bash
    ##
    ## Author: Bertrand Benoit <mailto:contact@bertrand-benoit.net>
    ## Description: Create clean git repositories for each file in root of specified source Git repository, updating history consequently. 
    ## Version: 1.0
    
    [ $# -lt 2 ] && echo -e "Usage: $0 <source repository> <dest root directory>" >&2 && exit 1
    
    SOURCE_REPO="$1"
    [ ! -d "$SOURCE_REPO" ] && echo -e "Specified source Git repository '$SOURCE_REPO' does not exist." >&2 && exit 1
    DEST_ROOT_DIR="$2"
    [ ! -d "$DEST_ROOT_DIR" ] && echo -e "Specified destination root directory '$DEST_ROOT_DIR' does not exist." >&2 && exit 1
    
    sourceRepoName=$( basename "$SOURCE_REPO" )
    
    # For each file in root of the source git repository.
    for refToManage in $( find "$SOURCE_REPO" -maxdepth 1 -type f ); do
      echo -ne "Managing $refToManage ... "
    
      refFileName=$( basename "$refToManage" )
      newDestRepo="$DEST_ROOT_DIR/$refFileName"
    
      # Creates the repository if not existing.
      logFile="$newDestRepo/logFile.txt"
      echo -ne "creating new repository: $newDestRepo, Log file: $logFile ... "
      if [ ! -d "$newDestRepo" ]; then
        mkdir -p "$newDestRepo"
        cd "$newDestRepo"
        ! git clone -q "$SOURCE_REPO" && echo -e "Error while cloning source repository to $newDestRepo." >&2 && exit 2
      fi
      cd "$newDestRepo/$sourceRepoName"
    
      # Removes all other resources.
      FILTER='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "'$refFileName'" | tr "\n" "\0" | xargs -0 git rm -f --cached -r --ignore-unmatch'
      ! git filter-branch -f --prune-empty --index-filter "$FILTER" -- --all >"$logFile" 2>&1 && echo -e "Error while cleaning new git repository." >&2 && exit 3
    
      # Cleans remote information to ensure there is no push to the source repository.
      ! git remote remove origin >>"$logFile" 2>&1 && echo -e "Error while removing remote." >&2 && exit 2
    
      echo "done"
    done
    

    Usage :

    mkdir /tmp/cleanRepoDest
    createCleanGitRepo.sh ~/_gitRepo/Scripts /tmp/cleanRepoDest
    

    In destination directory, it will create a new clean git repository for EACH file in root directory of specified source Git repository. In each one, the history is clean and is only related to the kept script.

    In addition it disconnects/removes the remote to ensure avoiding issue pushing back the changes to the source repository.

    This way, it is easy to 'migrate' from a big dirty catchall Git Repository, to various clean ones :-)