pythonbashshellsnakemakegatk

Snakemake integrate the multiple command lines in a rule


The output of my first command line "bcftools query -l {input.invcf} | head -n 1" prints the name of the first individual of vcf file (i.e. IND1). I want to use that output in selectvariants GATK in -sn IND1 option. How is it possible to integrate the 1st comamnd line in snakemake in order to use it's output in the next one?

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ind= ???
        ref="ref.fasta"
    output:
        out="{family}.dn.vcf"
    shell:
        """
        bcftools query -l {input.invcf} | head -n 1 > {params.ind}
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn {params.ind} -O {output.out}
        """

Solution

  • There are several options, but the easiest one is to store the results into a temporary bash variable:

    rule selectvar:
       ...
       shell:
            """
            myparam=$(bcftools query -l {input.invcf} | head -n 1)
            gatk -sn "$myparam" ...
            """
    

    As noted by @dariober, if one modifies pipefail behaviour, there could be unexpected results, see the example in this answer.