bash pipe windows-subsystem-for-linux glob plan-9

Asterisk glob expansion (*) unexpected behaviour in bash on WSL's 9P mounts

I'm observing strange behavior from my script only in WSL on 9P mounts (Windows disks, like /mnt/c/...).

Glob expantion of asterisk (*) in for loop results in duplicates ~~on next invocation~~ if there is a pipe that writes to temporary file.

#!/bin/bash

# save true file number
true_f_num=$(for fn in *; do echo; done | wc -l)

while :; do
    # count of iterations to outer loop pipe
    echo
    
    # save current file list in file
    echo -n '' >.list
    
    for fn in *; do
        echo "$fn" >>.list
    
    # line below causes unexpected behaviour
    # on 9P mounts (like /mnt/c/...)
    # (WSL2, ubuntu 22.04)
    done | cat > .tmp; rm .tmp
    
    # check 
    [ $(cat .list | wc -l ) -ne $true_f_num ] && break

done | wc -l

Here is a script jist to test and illustrate that. It is expected to run forever. But on WSL's 9P mounts it stops after some random number of iterations.

I can't understand the mechanism of this. I assumed that glob expansion happens once, and only later it starts being passed to the pipe (subshell) and changes the contents of directory. And how can this affect the next iteration? The file was there, but it's not anymore. Does some invisible fifo appear in the directory?

Please let the wise explain what is happening here.

Solution

[The example script] is expected to run forever. But on WSL's 9P mounts it stops after some random number of iterations.

Presumably that's because [ $(cat .list | wc -l ) -ne $true_f_num ] evaluates to true on some evaluation, and indeed, you observe in comments that after the script terminates, you find a duplicate name in the .list file left behind. That would do it, unless you're also missing one name.

File .list is created and populated this way:

    echo -n '' >.list

    for fn in *; do
        echo "$fn" >>.list
        # [...]
    done | cat > .tmp

A rm .tmp follows, but that is not part of managing .list, and it is not executed until after the above pipeline completes. However, because .tmp is removed after each execution of the pipeline, it is reasonable to assume that it does not exist before control reaches the pipeline.

Because the for command appears in a pipeline, it is executed in a subshell. The pattern (*) within is expanded, in the subshell, producing one word per file, before that command is executed. Then the subshell's standard output is redirected into the pipe, and finally, the loop body is executed once for each word in the expansion of *.

At the same time, the standard input of cat is redirected from the pipe, its standard output is redirected to .tmp, first creating that file if necessary, and then cat is launched.

NOTE WELL, then, that it is unspecified and not generally predictable when .tmp will be created relative to the expansion of the * pattern in the for command. It could be before, during, or after.

Observations:

by default, pathname expansion omits files whose names start with . unless the . is explicitly matched by the pattern. Therefore, in the above code, .tmp is not expected to appear in the expansion of *, regardless of the relative timing of its creation.
in the example command, pathname expansion is expected to be performed, once, before any iterations of the loop body are performed.
it is not expected for pathname expansion to yield duplicate pathnames, because that is not consistent with the Bash or POSIX specifications.
Unless file .list is modified contemporaneously by something else not disclosed, the appearance of duplicate lines within indicates that at least one of the previous two expectations is violated. This would be a bug.
It seems most likely to me that such a bug would be associated with the creation of file .tmp at an inopportune time during the evaluation of the pathname expansion. I can think of at least one specific form of implementation flaw that might give rise to such an issue. This is of course speculative, but I think it plausible.

How can this affect the next iteration?

I think each iteration of the outer loop has its own, independent opportunity to trigger the bug. And I think it's likely to be a bug in pathname expansion, so not an issue of one iteration of the inner loop affecting another, either.

Does some invisible fifo appear in the directory?

Unlikely.