zshglob

Zsh recursive globbing pattern with exclusion does not work properly with current directory


I meant to list all files recursively excluding AV1-encoded files.

But the exclusion pattern does not work as I expected:

/Volumes/X/C/New/studio/成长的烦恼 copy ❯ tree                                                     
.
├── 成zhang的fan恼-剧场版2000-欢乐家庭 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
├── 成zhang的fan恼-剧场版2000-欢乐家庭.mkv
├── 成zhang的fan恼-剧场版2004-西弗一家的归来 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
└── 成zhang的fan恼-剧场版2004-西弗一家的归来.mkv

1 directory, 4 files
/Volumes/X/C/New/studio/成长的烦恼 copy ❯ ls **/*.mkv~**/*AV1*                                     
成zhang的fan恼-剧场版2000-欢乐家庭 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
成zhang的fan恼-剧场版2000-欢乐家庭.mkv
成zhang的fan恼-剧场版2004-西弗一家的归来 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
成zhang的fan恼-剧场版2004-西弗一家的归来.mkv
/Volumes/X/C/New/studio/成长的烦恼 copy ❯                                                          

However, it will work if I move those files into a subfolder:

/Volumes/X/C/New/studio/成长的烦恼 copy ❯ tree                                                     
.
└── New Folder With Items
    ├── 成zhang的fan恼-剧场版2000-欢乐家庭 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
    ├── 成zhang的fan恼-剧场版2000-欢乐家庭.mkv
    ├── 成zhang的fan恼-剧场版2004-西弗一家的归来 _[HighQ-AV1 (Modified)-svt_av1_10bit-rf_28.00].mkv
    └── 成zhang的fan恼-剧场版2004-西弗一家的归来.mkv

2 directories, 4 files
/Volumes/X/C/New/studio/成长的烦恼 copy ❯ ls **/*.mkv~**/*AV1*                                     
New Folder With Items/成zhang的fan恼-剧场版2000-欢乐家庭.mkv
New Folder With Items/成zhang的fan恼-剧场版2004-西弗一家的归来.mkv

Now I am confused here. Even Grok is confused too.

I mean the recursive globbing pattern **/*something seems to fail and only fail for matching files directly under ./ when it comes after ~, which is strange to me.

Does anyone know what is wrong here?

From Mac Terminal Zsh


Solution

  • tl;dr - use this:

    setopt extendedglob
    print -l **/(*.mkv~*AVI*)
    

    The recursive **/ operator is actually shorthand for (*/)#, i.e. */, grouped together with (...), repeated zero or more times (#). The biggest difference is that **/ does not require extendeglob to be set, whereas the # operator does.

    However, the short version is only available in some circumstances. The phrasing in the documentation is that "the ‘*’ operators revert to their usual effect." The docs there don't address this situation explicitly, but that's what is happening.

    Therefore **/*.mkv~**/*AV1* is really the same as **/*.mkv~*/*AV1*. It says to recursively find the *.mkv files, and then exclude files that match __/__AVI__. Only files in subdirectories will match that pattern.

    Changing the pattern to the non-shortcut form will get you a bit closer:

    print -l **/*.mkv~(*/)#*AVI*
    

    Now any files with AVI anywhere in the name will be excluded, including those in the base directory. However, this ends up giving the same results as **/*.mkv~*AVI*, and will also exclude any directories that have AVI in the name. That's usually not the intended result.

    This occurs because the pattern to the right of the ~ does not treat / as a special character. For a variety of reasons, the shell treats / differently in a 'glob' pattern, while it's simply another character in a 'match' pattern.

    With this difference in mind, we can restructure the pattern as "files in any directory (**/*.mkv)", treating / as special, combined with "no AVI in the final component (~*AVI*)."

    With parens added for grouping, we end up with this:

    print -l **/(*.mkv~*AVI*)