linuxbashfor-loopgrepsamtools

Extracting lines from file using grep in a for loop, exporting to new file with variable in file name


I am trying to extract all lines from a file that contain a string using a for loop with a file that contains a list of possible strings. I also want to export the results of grep to a new file with the variable in the file name.

Here is what I have:

file="variables.txt"
listofvariables=$(cat ${file})

for variable in ${listofvariables}
do
    samtools view sample.bam | \
    grep "'${variable}'" \
    > sample.${variable}.bam
done

What this code does is simply make a blank file for every variable. Why isn't grep extracting lines that contain that variable and putting it into those files?

For reference, here is what the variables.txt file looks like:

mmu-let-7g-5p
mmu-let-7g-3p
mmu-let-7i-5p
mmu-let-7i-3p
mmu-miR-1a-1-5p
mmu-miR-1a-3p
mmu-miR-15b-5p
mmu-miR-15b-3p
mmu-miR-23b-5p
mmu-miR-23b-3p

And here is what the samtools view output looks like:

7238520-1_CATAAT.mmu-miR-125b-5p    0   chr1    11301523    60  75M *   0   0CAGGTGTTTTCTCAGGCATTTGGATTTCTATAGAATCATAGTATTAAAATTTCAAAGTAATAACATTGCTTTTTA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:75 YT:Z:UU NH:i:1
1422982-2_CCCCGC.mmu-miR-132-3p 0   chr1    11301726    60  97M *   0   0   AAGTCTGTTTTTATGTGAGTGTTCCTGTGAAACTGAGGTCTGATGACTCTTCCTTAAGCAATTACAACTTCATTAGCATACATAAGGTTCAATTAAA   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:97 YT:Z:UU NH:i:1
5675450-1_CCCCGC.mmu-miR-132-3p 0   chr1    11301726    60  97M *   0   0   AAGTCTGTTTTTATGTGAGTGTTCGTGTGAAACTGAGGTCTGATGACTCTTCCTTAAGCAATTACAACTTC^C

For those who may be unfamiliar samtools view simply reads out the .bam file. You can think of it like cat.

Thanks in advance!


Solution

  • Since ...

    What this code does is simply make a blank file for every variable.

    ... you know that your variables file is being read correctly, and your for loop is correctly iterating over the results. That the resulting files are empty indicates that grep is not finding any matches to your pattern.

    Why not? Because the pattern in your grep command ...

        grep "'${variable}'" \
    

    ... doesn't mean what you appear to think it means. You have taken some pains to get literal apostrophes (') into the pattern, but these have no special meaning in that context. Your pattern does not match any lines because in the data, there are no apostrophes around the appearances of the target strings.

    This would be better:

        grep -F -e "${variable}" \
    

    The -F option tells grep to treat the pattern as a fixed string to match, so that nothing within is interpreted as a regex metacharacter. The -e ensures that the pattern is interpreted as such, even if, for example, it begins with a - character. The double quotes remain, as they are required to ensure that the shell does not perform word splitting on the expanded result, and of course the inner apostrophes are gone, since they were causing the main problem.