bashwhile-looppipepiping

Why piping input to "read" only works when fed into "while read ..." construct?


I've been trying to read input into environment variables from program output like this:

echo first second | read A B ; echo $A-$B 

And the result is:

-

Both A and B are always empty. I read about bash executing piped commands in sub-shell and that basically preventing one from piping input to read. However, the following:

echo first second | while read A B ; do echo $A-$B ; done

Seems to work, the result is:

first-second

Can someone please explain what is the logic here? Is it that the commands inside the while ... done construct are actually executed in the same shell as echo and not in a sub-shell?


Solution

  • How to do a loop against stdin and get result stored in a variable

    Under (and other also), when you pipe something to another command via |, you will implicitly create a fork, a subshell that is a child of current session. The subshell can't affect current session's environment.

    So this:

    TOTAL=0
    printf "%s %s\n" 9 4 3 1 77 2 25 12 226 664 |
      while read A B;do
          ((TOTAL+=A-B))
          printf "%3d - %3d = %4d -> TOTAL= %4d\n" $A $B $[A-B] $TOTAL
        done
    echo final total: $TOTAL
    

    won't give expected result! :

      9 -   4 =    5 -> TOTAL=    5
      3 -   1 =    2 -> TOTAL=    7
     77 -   2 =   75 -> TOTAL=   82
     25 -  12 =   13 -> TOTAL=   95
    226 - 664 = -438 -> TOTAL= -343
    echo final total: $TOTAL
    final total: 0
    

    Where computed TOTAL could'nt be reused in main script.

    Inverting the fork

    By using Process Substitution, Here Documents or Here Strings, you could inverse the fork:

    Here strings

    read A B <<<"first second"
    echo $A
    first
    
    echo $B
    second
    

    Here Documents

    while read A B;do
        echo $A-$B
        C=$A-$B
      done << eodoc
    first second
    third fourth
    eodoc
    first-second
    third-fourth
    

    outside of the loop:

    echo : $C
    : third-fourth
    

    Here Commands

    TOTAL=0
    while read A B;do
        ((TOTAL+=A-B))
        printf "%3d - %3d = %4d -> TOTAL= %4d\n" $A $B $[A-B] $TOTAL
      done < <(
        printf "%s %s\n" 9 4 3 1 77 2 25 12 226 664
    )
      9 -   4 =    5 -> TOTAL=    5
      3 -   1 =    2 -> TOTAL=    7
     77 -   2 =   75 -> TOTAL=   82
     25 -  12 =   13 -> TOTAL=   95
    226 - 664 = -438 -> TOTAL= -343
    
    # and finally out of loop:
    echo $TOTAL
    -343
    

    Now you could use $TOTAL in your main script.

    Piping to a command list

    But for working only against stdin, you may create a kind of script into the fork:

    printf "%s %s\n" 9 4 3 1 77 2 25 12 226 664 | {
        TOTAL=0
        while read A B;do
            ((TOTAL+=A-B))
            printf "%3d - %3d = %4d -> TOTAL= %4d\n" $A $B $[A-B] $TOTAL
        done
        echo "Out of the loop total:" $TOTAL
      }
    

    Will give:

      9 -   4 =    5 -> TOTAL=    5
      3 -   1 =    2 -> TOTAL=    7
     77 -   2 =   75 -> TOTAL=   82
     25 -  12 =   13 -> TOTAL=   95
    226 - 664 = -438 -> TOTAL= -343
    Out of the loop total: -343
    

    Note: $TOTAL could not be used in main script (after last right curly bracket } ).

    Using lastpipe bash option

    As @CharlesDuffy correctly pointed out, there is a bash option used to change this behaviour. But for this, we have to first disable job control:

    shopt -s lastpipe           # Set *lastpipe* option
    set +m                      # Disabling job control
    TOTAL=0
    printf "%s %s\n" 9 4 3 1 77 2 25 12 226 664 |
      while read A B;do
          ((TOTAL+=A-B))
          printf "%3d - %3d = %4d -> TOTAL= %4d\n" $A $B $[A-B] $TOTAL
        done
    
      9 -   4 =    5 -> TOTAL= -338
      3 -   1 =    2 -> TOTAL= -336
     77 -   2 =   75 -> TOTAL= -261
     25 -  12 =   13 -> TOTAL= -248
    226 - 664 = -438 -> TOTAL= -686
    
    echo final total: $TOTAL
    -343
    

    This will work, but I (personally) don't like this because this is not standard and won't help to make script readable. Also disabling job control seem expensive for accessing this behaviour.

    Note: Job control is enabled by default only in interactive sessions. So set +m is not required in normal scripts.

    So forgotten set +m in a script would create different behaviours if run in a console or if run in a script. This will not going to make this easy to understand or to debug...