bashloops

Loop over two sets of files based on a specific pattern


I have two sets of files :

#1

Axiom_AgDivMS2_apple.r1.annot.csv
Axiom_AgDivMS2_favabean.r1.annot.csv
Axiom_AgDivMS2_gardenpea.r1.annot.csv
Axiom_AgDivMS2_pear.r1.annot.csv
Axiom_AgDivMS2_white_lupin.r1.annot.csv

#2

feverole_Axiom_AgDivMS2_apple.tag        lupin_Axiom_AgDivMS2_apple.tag        poire_Axiom_AgDivMS2_apple.tag        pois_Axiom_AgDivMS2_apple.tag        pomme_Axiom_AgDivMS2_apple.tag
feverole_Axiom_AgDivMS2_favabean.tag     lupin_Axiom_AgDivMS2_favabean.tag     poire_Axiom_AgDivMS2_favabean.tag     pois_Axiom_AgDivMS2_favabean.tag     pomme_Axiom_AgDivMS2_favabean.tag
feverole_Axiom_AgDivMS2_gardenpea.tag    lupin_Axiom_AgDivMS2_gardenpea.tag    poire_Axiom_AgDivMS2_gardenpea.tag    pois_Axiom_AgDivMS2_gardenpea.tag    pomme_Axiom_AgDivMS2_gardenpea.tag
feverole_Axiom_AgDivMS2_pear.tag         lupin_Axiom_AgDivMS2_pear.tag         poire_Axiom_AgDivMS2_pear.tag         pois_Axiom_AgDivMS2_pear.tag         pomme_Axiom_AgDivMS2_pear.tag
feverole_Axiom_AgDivMS2_white_lupin.tag  lupin_Axiom_AgDivMS2_white_lupin.tag  poire_Axiom_AgDivMS2_white_lupin.tag  pois_Axiom_AgDivMS2_white_lupin.tag  pomme_Axiom_AgDivMS2_white_lupin.tag

I need to match the #2 files with the *_apple.tag , *_favabean.tag , *_gardenpea.tag , *_pear.tag and *_white_lupin.tag to their correspoding file in the #1. I cannot show all the files here but it looks like this :

enter image description here

I mean, the files with the tag "*_apple.tag" should only match the Axiom_AgDivMS2_apple.r1.annot.csv , because they have the common "apple" pattern.

The tags are delimited between the patterns "Axiom_AgDivMS2_" and ".r1.annot.csv". :

for i in *.csv
 do a=${i%.r1.annot.csv}; b=${a#*_*_}
 echo $b
done

apple
favabean
gardenpea
pear
white_lupin

For example for "apple" , the combinations I should get :

Axiom_AgDivMS2_apple.r1.annot.csv feverole_Axiom_AgDivMS2_apple.tag
Axiom_AgDivMS2_apple.r1.annot.csv lupin_Axiom_AgDivMS2_apple.tag
Axiom_AgDivMS2_apple.r1.annot.csv poire_Axiom_AgDivMS2_apple.tag
Axiom_AgDivMS2_apple.r1.annot.csv pois_Axiom_AgDivMS2_apple.tag
Axiom_AgDivMS2_apple.r1.annot.csv pomme_Axiom_AgDivMS2_apple.tag

I do all the combinations at this step, but not only the necessary ones :

for N in *.csv; do   
   for S in *.tag; do     
    echo ${N} ${S};   
   done; 
done

EDIT :

In this ticket, all the *.csv files have the same structure Axiom_AgDivMS2_*.r1.annot.csv . It is the same for the tag files with *_Axiom_AgDivMS2_*.tag . That structure could change for another project, but the main thing is the pattern which makes the link between the *.csv and *.tag files. A *.tag file will necessary match a *.csv file, and vice versa. And all the *.tag files will have the same number of combinations (which is the number of *.csv files).

Any help?


Solution

  • OP's update has a good start with the extraction of the 'tag' from the *.csv file names (ie, the assignment to the b variable). We'll build on this, with a change in variable names:

    for file1 in *.csv
    do
        [[ ! -f "${file1}" ]] && continue             # in case there are no files ending in *.csv
    
        tag="${file1%.r1.annot.csv}"
        tag="${tag#*_*_}"
    
        for file2 in *_"${tag}".tag                   # wrap ${tag} in double quotes in case of embedded white space
        do
            [[ ! -f "${file2}" ]] && continue         # again, just in case there are no files ending in _${tag}.tag
    
            echo "${file1} ${file2}"
        done
    done
    

    This generates:

    Axiom_AgDivMS2_apple.r1.annot.csv feverole_Axiom_AgDivMS2_apple.tag
    Axiom_AgDivMS2_apple.r1.annot.csv lupin_Axiom_AgDivMS2_apple.tag
    Axiom_AgDivMS2_apple.r1.annot.csv poire_Axiom_AgDivMS2_apple.tag
    Axiom_AgDivMS2_apple.r1.annot.csv pois_Axiom_AgDivMS2_apple.tag
    Axiom_AgDivMS2_apple.r1.annot.csv pomme_Axiom_AgDivMS2_apple.tag
    
    Axiom_AgDivMS2_favabean.r1.annot.csv feverole_Axiom_AgDivMS2_favabean.tag
    Axiom_AgDivMS2_favabean.r1.annot.csv lupin_Axiom_AgDivMS2_favabean.tag
    Axiom_AgDivMS2_favabean.r1.annot.csv poire_Axiom_AgDivMS2_favabean.tag
    Axiom_AgDivMS2_favabean.r1.annot.csv pois_Axiom_AgDivMS2_favabean.tag
    Axiom_AgDivMS2_favabean.r1.annot.csv pomme_Axiom_AgDivMS2_favabean.tag
    
    Axiom_AgDivMS2_gardenpea.r1.annot.csv feverole_Axiom_AgDivMS2_gardenpea.tag
    Axiom_AgDivMS2_gardenpea.r1.annot.csv lupin_Axiom_AgDivMS2_gardenpea.tag
    Axiom_AgDivMS2_gardenpea.r1.annot.csv poire_Axiom_AgDivMS2_gardenpea.tag
    Axiom_AgDivMS2_gardenpea.r1.annot.csv pois_Axiom_AgDivMS2_gardenpea.tag
    Axiom_AgDivMS2_gardenpea.r1.annot.csv pomme_Axiom_AgDivMS2_gardenpea.tag
    
    Axiom_AgDivMS2_pear.r1.annot.csv feverole_Axiom_AgDivMS2_pear.tag
    Axiom_AgDivMS2_pear.r1.annot.csv lupin_Axiom_AgDivMS2_pear.tag
    Axiom_AgDivMS2_pear.r1.annot.csv poire_Axiom_AgDivMS2_pear.tag
    Axiom_AgDivMS2_pear.r1.annot.csv pois_Axiom_AgDivMS2_pear.tag
    Axiom_AgDivMS2_pear.r1.annot.csv pomme_Axiom_AgDivMS2_pear.tag
    
    Axiom_AgDivMS2_white_lupin.r1.annot.csv feverole_Axiom_AgDivMS2_white_lupin.tag
    Axiom_AgDivMS2_white_lupin.r1.annot.csv lupin_Axiom_AgDivMS2_white_lupin.tag
    Axiom_AgDivMS2_white_lupin.r1.annot.csv poire_Axiom_AgDivMS2_white_lupin.tag
    Axiom_AgDivMS2_white_lupin.r1.annot.csv pois_Axiom_AgDivMS2_white_lupin.tag
    Axiom_AgDivMS2_white_lupin.r1.annot.csv pomme_Axiom_AgDivMS2_white_lupin.tag
    

    NOTE: blank lines manually added for readability