bashalignmentsequencedna-sequencegatk

Getting GATK argument error and dont understand?


Hello bash programmers, I am using GATK and trying to loop through my bam files and do local realignment using my target_intervals and known indels. Below is my code I am trying. I am hoping someone can help with the error and correct my code.

# do the local realignment.
echo "local realignment..."

for file in `ls -d adp/map/*marked_duplicates.bam`
do
java -jar ~/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar \
-T IndelRealigner \
-R ~/flybase/fb-r5.57/dmel-all-chromosome-r5.57.fasta \
-I $file \
-known adp/map/*indel_intervals.vcf \
-targetIntervals adp/map/*target_intervals.list \
-o ${file}_realigned_reads.bam
done

wait

# Create a new index file.
echo "indexing the realigned bam file..."

for file in `ls -d adp/map/*realigned_reads.bam`
do
~/software/samtools-1.2/samtools index $file
done

ERROR: when looking this up, it appears to be a coding issue, and I am not seeing it.

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid argument value 'adp/map/360M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 10.
##### ERROR Invalid argument value 'adp/map/517_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 11.
##### ERROR Invalid argument value 'adp/map/517M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 12.
##### ERROR Invalid argument value 'adp/map/900_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 13.
##### ERROR Invalid argument value 'adp/map/900M_F_L002.recal.bam.sorted.bam_marked_duplicates.bam_target_intervals.list' at position 14

.


Solution

  • At least part of the problem is the * in your commands. GATK doesn't deal well with globs. To specify multiple values to an argument, specify the argument multiple times.

    i.e. instead of

    -known adp/map/*indel_intervals.vcf
    

    you need to specify each file with a separate argument

    -known adp/map/first_file.indel_intervals.vcf
    -known adp/map/second_file.indel_intervals.vcf
    

    There may be other issues as well. For instance, I'm not certain that -targetIntervals can take multiple files as input. Also, that's very old version of gatk, you might want to upgrade to 3.8.