I would like to run multiple rules one after another using snakemake. However, when I run this script, the bam_list rule appears before samtools_markdup rule, and gives me an error that it cannot find input files, which are obviously have not been generated yet. How to solve this problem?
rule all:
input:
expand("dup/{sample}.dup.bam", sample=SAMPLES)
"dup/bam_list"
rule samtools_markdup:
input:
sortbam ="rg/{sample}.rg.bam"
output:
dupbam = "dup/{sample}.dup.bam"
threads: 5
shell:
"""
samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
"""
rule bam_list:
output:
outlist = "dup/bam_list"
shell:
"""
ls dup/*.bam > {output.outlist}
"""
Snakemake is following directions, you want dup/bam_list
and it can be produced without any inputs. I think what you mean to have is:
rule all:
input:
"dup/bam_list"
rule samtools_markdup:
input:
sortbam ="rg/{sample}.rg.bam"
output:
dupbam = "dup/{sample}.dup.bam"
threads: 5
shell:
"""
samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
"""
rule bam_list:
input:
expand("dup/{sample}.dup.bam", sample=SAMPLES)
output:
outlist = "dup/bam_list"
shell:
"""
ls dup/*.bam > {output.outlist}
"""
Now bam_list will wait until all the samtools_markdup jobs are completed. As an aside, I expect the contents of dup_list to be identical to expand("dup/{sample}.dup.bam", sample=SAMPLES)
, so if you use the file later in the workflow you can probably just use the expand output.