gitgit-checkoutinvalid-charactersgit-history

How to remove all files from GIT repo history with path having colon : in filename?


I have ISCSI node filenames with colons stored in GIT repository on Debian 10 Linux.

Example:

'iscsi/nodes/iqn.2000-01.com.synology:NAS01-DS916.nas/ff11::111:11ff:ff1f:1ff1,3260,1/default'
'iscsi/send_targets/1.2.3.4,3260/iqn.2000-01.com.synology:NAS01-DS916.nas,ff11::111:11ff:ff1f:1ff1,3260,1,default'

But checkout fails on Windows, because the colon is invalid character in Windows filename.

I get following GIT errors at Windows checkout:

error: invalid path 'iscsi/nodes/iqn.2000-01.com.synology:NAS01-DS916.nas/ff11::111:11ff:ff1f:1ff1,3260,1/default'
...
error: invalid path 'iscsi/send_targets/1.2.3.4,3260/iqn.2000-01.com.synology:NAS01-DS916.nas,ff11::111:11ff:ff1f:1ff1,3260,1,default'

Questions:

1) How to list all path having colon : in full GIT repo history?

2) How to remove all files from GIT repo history with path having at least one colon : in filename?

SOLUTION for 1) :

Works1:

git log --all --name-only -m --pretty= -- '*:*' | sort -u

Works2 (only for the named repo master):

git ls-tree -r master --name-only | grep ":"

Works3: Finally I used this to list files with colons in filename:

git log --format="reference" --name-status --diff-filter=A "*:*" >/opt/git_repo_files_w_colons.txt

UPDATE1 for 2):

I got

Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
  (expected freshly packed repo)
Note: when cloning local repositories, you need to pass
      --no-local to git clone to avoid this issue.
Please operate on a fresh clone instead.  If you want to proceed
anyway, use --force.

when executing

git filter-repo --invert-paths --path-match "*:*"

UPDATE2 for 2) :

Clone a copy of the repo:

git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4216), reused 8042 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.40 MiB | 17.08 MiB/s, done.
# Resolving deltas: 100% (4216/4216), done.

Remove the files with colon from repo history:

git filter-repo --invert-paths --path-match "*:*"
# Parsed 591 commits
# New history written in 0.47 seconds; now repacking/cleaning...
# Repacking your repo and cleaning out old unneeded objects
# HEAD is now at 501102d daily autocommit
# Enumerating objects: 9534, done.
# Counting objects: 100% (9534/9534), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (3696/3696), done.
# Writing objects: 100% (9534/9534), done.
# Total 9534 (delta 4216), reused 9534 (delta 4216), pack-reused 0
# Completely finished after 1.33 seconds.

Checking still shows filenames with colon:

git log --format="reference" --name-status --diff-filter=A "*:*"
# A    iscsi/nodes/iqn.2000-01.com.synology:NAS01-DS916.nas/ff11::111:11ff:ff1f:1ff1,3260,1/default
# ...

Unfortunately it seems filter-repo was executed, but log still lists filenames with colon :-(


Solution

  • link #1 did not provide solution to list paths having colo

    Check:

    git ls-tree -r master --name-only | grep ":"
    

    But the approach suggested was to reset all files without ":", and delete the rest:

    git ls-tree -r master --name-only | grep -v ":" | xargs git reset HEAD
    git commit -m "deleting all files with a colon in the name"
    git restore -- .
    

    The OP klor reports listing those files with a git log pretty format "reference" (which is <abbrev-hash> (<title-line>, <short-author-date>)):

    git log --format="reference" --name-status --diff-filter=A "*:*" >/opt/git_etc_repo_files_w_colons.txt
    

    The OP suggested:

    # Clone repository, to be executed on a safe repo:
    git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
    # Cloning into '/target/path/to/repo/clone'...
    # remote: Enumerating objects: 9534, done.
    # remote: Counting objects: 100% (9534/9534), done.
    # remote: Compressing objects: 100% (4776/4776), done.
    # remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
    # Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
    # Resolving deltas: 100% (4215/4215), done.
    
    cd /target/path/to/repo/clone/
    
    # List the files with colon from repo history into a list file:
    git log --all --name-only -m --pretty= -- '*:*' | sort -u >/opt/git_repo_files_w_colons.txt
    
    # Remove the files with colon from repo history:
    git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_colons.txt
    # Parsed 591 commits
    # New history written in 0.74 seconds; now repacking/cleaning...
    # Repacking your repo and cleaning out old unneeded objects
    # HEAD is now at e5fdf93 daily autocommit
    # Enumerating objects: 9347, done.
    # Counting objects: 100% (9347/9347), done.
    # Delta compression using up to 8 threads
    # Compressing objects: 100% (3696/3696), done.
    # Writing objects: 100% (9347/9347), done.
    # Total 9347 (delta 4078), reused 9345 (delta 4076), pack-reused 0
    # Completely finished after 1.59 seconds.
    
    # List files with colon to check result:
    git log --format="reference" --name-status --diff-filter=A "*:*"
    # Empty result, so git filter-repo was successful, filenames with colon were removed!