bashpipewindows-subsystem-for-linuxglobplan-9

Asterisk glob expansion (*) unexpected behaviour in bash on WSL's 9P mounts


I'm observing strange behavior from my script only in WSL on 9P mounts (Windows disks, like /mnt/c/...).

Glob expantion of asterisk (*) in for loop results in duplicates on next invocation if there is a pipe that writes to temporary file.

#!/bin/bash

# save true file number
true_f_num=$(for fn in *; do echo; done | wc -l)

while :; do
    # count of iterations to outer loop pipe
    echo
    
    # save current file list in file
    echo -n '' >.list
    
    for fn in *; do
        echo "$fn" >>.list
    
    # line below causes unexpected behaviour
    # on 9P mounts (like /mnt/c/...)
    # (WSL2, ubuntu 22.04)
    done | cat > .tmp; rm .tmp
    
    # check 
    [ $(cat .list | wc -l ) -ne $true_f_num ] && break

done | wc -l

Here is a script jist to test and illustrate that. It is expected to run forever. But on WSL's 9P mounts it stops after some random number of iterations.

I can't understand the mechanism of this. I assumed that glob expansion happens once, and only later it starts being passed to the pipe (subshell) and changes the contents of directory. And how can this affect the next iteration? The file was there, but it's not anymore. Does some invisible fifo appear in the directory?

Please let the wise explain what is happening here.


Solution

  • [The example script] is expected to run forever. But on WSL's 9P mounts it stops after some random number of iterations.

    Presumably that's because [ $(cat .list | wc -l ) -ne $true_f_num ] evaluates to true on some evaluation, and indeed, you observe in comments that after the script terminates, you find a duplicate name in the .list file left behind. That would do it, unless you're also missing one name.

    File .list is created and populated this way:

        echo -n '' >.list
    
        for fn in *; do
            echo "$fn" >>.list
            # [...]
        done | cat > .tmp
    

    A rm .tmp follows, but that is not part of managing .list, and it is not executed until after the above pipeline completes. However, because .tmp is removed after each execution of the pipeline, it is reasonable to assume that it does not exist before control reaches the pipeline.

    Because the for command appears in a pipeline, it is executed in a subshell. The pattern (*) within is expanded, in the subshell, producing one word per file, before that command is executed. Then the subshell's standard output is redirected into the pipe, and finally, the loop body is executed once for each word in the expansion of *.

    At the same time, the standard input of cat is redirected from the pipe, its standard output is redirected to .tmp, first creating that file if necessary, and then cat is launched.

    NOTE WELL, then, that it is unspecified and not generally predictable when .tmp will be created relative to the expansion of the * pattern in the for command. It could be before, during, or after.

    Observations:

    How can this affect the next iteration?

    I think each iteration of the outer loop has its own, independent opportunity to trigger the bug. And I think it's likely to be a bug in pathname expansion, so not an issue of one iteration of the inner loop affecting another, either.

    Does some invisible fifo appear in the directory?

    Unlikely.