gitgithubbfg-repo-cleaner

How to remove big (>100MB) file from a GitHub repository and push successfully?


I am in the same situation as described here after having inadvertently adding a big file that I don't want and having done additional commits of other work (not knowing the push would fail) after inadvertently adding the big file:

Am I supposed to run BFG on the mirrored repo or the original?


ATTEMPT #1 Tried this to remove the file:

git rm bigfile
git commit bigfile
git push

No luck. The push was still stuck on trying to upload the big file even though the later commit deleted it:

$ git push

Username for 'https://github.com':
Password for 'https://traildreaming@github.com':
Counting objects: 210, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (66/66), done.
Writing objects: 100% (210/210), 5.72 MiB | 1.47 MiB/s, done.
Total 210 (delta 147), reused 203 (delta 140)
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: eedddea1fcb95663492e16c14fc3a250
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File doc/image.eps is 591.70 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/traildreaming/myrepo.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/traildreaming/myrepo.git'

ATTEMPT #2 Tried the instructions for https://rtyley.github.io/bfg-repo-cleaner/

But it does not see my big files which are preventing me from doing a push:

$ git clone --mirror https://github.com/traildreaming/myrepo.git

Cloning into bare repository 'myrepo.git'...
Username for 'https://github.com':
Password for 'https://traildreaming@github.com':
remote: Counting objects: 20471, done.
remote: Total 20471 (delta 0), reused 0 (delta 0), pack-reused 20471
Receiving objects: 100% (20471/20471), 812.92 MiB | 4.00 MiB/s, done.
Resolving deltas: 100% (14464/14464), done.
Checking connectivity... done.

$ cp -fr myrepo.git myrepo.git.bac

note2@Travel-2015-11 /cygdrive/c/Users/note2/Data/git/tmpmirror
$ java -jar ../bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git

Using repo : C:\Users\note2\Data\git\tmpmirror\myrepo.git

Scanning packfile for large blobs: 20471
Scanning packfile for large blobs completed in 103 ms.
Warning : no large blobs matching criteria found in packfiles - does the repo need to be packed?
Please specify tasks for The BFG :
bfg 1.12.12

ATTTEMPT #3 Trying this resulted in "remote: error:" messages:

$ git clone --mirror ../../myrepo/.git

Cloning into bare repository 'myrepo.git'...
done.

$ java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M tmpmirror/myrepo/myrepo.git

Using repo : C:\Users\note2\Data\git\tmpmirror\myrepo\myrepo.git

Scanning packfile for large blobs: 12545
Scanning packfile for large blobs completed in 66 ms.
Found 1 blob ids for large blobs - biggest=620441479 smallest=620441479
Total size (unpacked)=620441479
Found 1322 objects to protect
Found 4 commit-pointing refs : HEAD, refs/heads/master, refs/remotes/origin/HEAD, refs/remotes/origin/master

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit b68c0cbc (protected by 'HEAD')

Cleaning
--------

Found 2769 commits
Cleaning commits:       100% (2769/2769)
Cleaning commits completed in 1,485 ms.

Updating 1 Ref
--------------

        Ref                 Before     After
        ---------------------------------------
        refs/heads/master | b68c0cbc | 49823acc

Updating references:    100% (1/1)
...Ref update completed in 18 ms.

Commit Tree-Dirt History
------------------------

        Earliest                                              Latest
        |                                                          |
        ...........................................................D

        D = dirty commits (file tree fixed)
        m = modified commits (commit message or parents changed)
        . = clean commits (no changes to file tree)

                                Before     After
        -------------------------------------------
        First modified commit | 0ef7f866 | e3d74aee
        Last dirty commit     | 338d2b46 | 01ca7b80

Deleted files
-------------

        Filename                     Git id
        ------------------------------------------------
        image.eps | e12fe50b (591.7 MB)


In total, 50 object ids were changed. Full details are logged here:

        C:\Users\note2\Data\git\tmpmirror\myrepo\myrepo.git.bfg-report\2016-06-11\15-59-30

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

Counting objects: 20681, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (20114/20114), done.
Writing objects: 100% (20681/20681), done.
Total 20681 (delta 14625), reused 3226 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git push

Counting objects: 210, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (82/82), done.
Writing objects: 100% (210/210), 1.81 MiB | 0 bytes/s, done.
Total 210 (delta 147), reused 185 (delta 124)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error:
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error:
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To /cygdrive/c/Users/note2/Data/git/tmpmirror/myrepo/../../myrepo/.git
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to '/cygdrive/c/Users/note2/Data/git/tmpmirror/myrepo/../../myrepo/.git'

Solution

  • Here is how I got it to work after the "git push" got stuck due to adding and committing a big file and then continuing committing with other work while away from the internet:

    I downloaded bfg*jar from:
    https://rtyley.github.io/bfg-repo-cleaner/

    cd tmpmirror; mkdir myrepo; cd myrepo; git clone --mirror ../../myrepo/.git
    java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git
    cd myrepo.git; git reflog expire --expire=now --all && git gc --prune=now --aggressive
    git push https://github.com/traildreaming/myrepo
    cd ../../..
    mv myrepo myrepo_old
    git clone https://github.com/traildreaming/myrepo
    cd myrepo
    

    If you get this message, then try with the extra steps from below

    $ java -jar ../../bfg-1.12.13.jar --strip-blobs-bigger-than 100M myrepo.git
    
    Using repo : [DIR]\tmpmirror\myrepo\myrepo.git
    
    Scanning packfile for large blobs: 20681
    Scanning packfile for large blobs completed in 135 ms.
    Warning : no large blobs matching criteria found in packfiles - does the         repo need to be packed?
    Please specify tasks for The BFG :
    bfg 1.12.13
    Usage: bfg [options] [<repo>]
    
      -b <size> | --strip-blobs-bigger-than <size>
            strip blobs bigger than X (eg '128K', '1M', etc)
    

    ```

    cd tmpmirror; mkdir myrepo; cd myrepo; git clone --mirror ../../myrepo/.git
    cd myrepo.git; git repack; cd ..
    java -jar bfg-1.12.12.jar --strip-blobs-bigger-than 100M myrepo.git
    cd myrepo.git; git reflog expire --expire=now --all && git gc --prune=now --aggressive
    git push https://github.com/traildreaming/myrepo
    cd ../../..
    mv myrepo myrepo_old
    git clone https://github.com/traildreaming/myrepo
    cd myrepo
    

    And then continue working in the newly cloned repo. Thanks to the advice at Am I supposed to run BFG on the mirrored repo or the original? to use "git push https://github.com/traildreaming/myrepo" and not "git push".