bashpattern-matchinggnu-parallel

echoing only itens which contain chr1 and not chr10 and feeding into parallel


I have the following files in my folder

11111-chr1A1.txt
11111-chr1C2.txt
11111-chr1D3.txt
11111-chr114.txt
11111-chr10A1.txt
11111-chr10-C2.txt
11111-chr1003.txt
11111-chr10-4.txt

And I need to feed them into parallel by chr{number} as chunks. chr{number} should be exact pattern matching, which is what is causing problems in this case

for example

parallel "echo {1}" 

should output

11111-chr1A1.txt
11111-chr1C2.txt
11111-chr1D3.txt
11111-chr114.txt

Then the second chunk:

11111-chr10A1.txt
11111-chr10-C2.txt
11111-chr1003.txt
11111-chr10-4.txt

I tried:

for i in {1..10}
do
parallel "echo {1/}" ::: *chr"$i"*txt 
done

Which always outputs all files at the same time because the pattern chr1 and chr10 are superimposed

If needed, creating a CSV file beforehand is ok, for example defining the first column of the csv file as all chr1 then the second column as chr10 files then feeding it into parallel per column


Solution

  • I assume you want longer matches to take precedence over shorter matches.

    # Split files into groups - each in their own dir
    # -j1 is important to force *chr10* be moved before *chr1*
    parallel -j1 'mkdir -p out/{}; mv *{}* out/{}' ::: chr{10..1..1}
    
    do_group() {
       cd "$1"
       parallel echo ::: *
    }
    export -f do_group
    
    # Run each dir seperately
    parallel --tag do_group ::: out/*