parallel-processinggnu-parallel

How does the waiting behaviour of gnu parallel work in pipes?


I am using parallel from apt install parallel on WSL.

I'm looking to understand the mechanism that explains

Specific Questions

  1. seq 10 | parallel -j 1 'sleep 1; echo {}' | parallel -j 4 'echo "prefix" {}'
  2. seq 10 | parallel -j 4 'sleep 1; echo {}' | parallel -j 1 'echo "prefix" {}'
  3. seq 10 | parallel -j 4 'sleep 1; echo {} | echo "prefix" {}'

Why does option 1 wait 6 seconds, start printing 1 per second until prefix 5 then dump all at once from 6 to 10?

Why does option 2 wait 1 second, then output 1-3, 4-7, 8-10 in next 3 seconds?

Why does option 3 wait 1 second, then output 1-4, 5-8, 9-10 in next 3 seconds?

General Questions

What are the differences in below designs

  1. input | parallel command1 | parallel command2 (maps to option 1 and 2 above)
  2. input | parallel 'command1 | command2' (maps to option 3 above)
  3. input | parallel command1 | command2

I don't have a specific use case at the moment, but trying to understand the considerations involved when deciding among the 3 designs.


Solution

  • You are hitting some crappy design, that has since been improved.

    Today output is delayed one job (i.e. output from job 1 is printed when job 2 finished).

    $ seq 10 | parallel -j 1 'sleep 1; echo {}' | parallel-20210922 -j 4 'echo "prefix" {}' | timestamp -d
    4.445 prefix 1
    4.451 prefix 2
    4.456 prefix 3
    5.552 prefix 4
    6.564 prefix 5
    7.686 prefix 6
    8.817 prefix 7
    9.948 prefix 8
    11.082 prefix 9
    11.086 prefix 10
    
    $ seq 10 | parallel -j 1 'sleep 1; echo {}' | parallel-20211022 -j 4 'echo "prefix" {}' | timestamp -d
    2.199 prefix 1
    3.295 prefix 2
    4.409 prefix 3
    5.527 prefix 4
    6.541 prefix 5
    7.663 prefix 6
    8.793 prefix 7
    9.924 prefix 8
    11.066 prefix 9
    11.070 prefix 10