bashshellbioinformaticssnakemakedata-pipeline

Handling multiple inputs for command in Snakemake


I'm currently working on a project that involves me using snakemake to run svaba, a variant caller, on genome data. svaba run can take multiple sample files but requires a flag in front of each file.

For example: svaba -g.... -t s1.bam -t s2.bam -t s3.bam

How do I go about setting this up in Snakemake? Here is some mock up code. There are probably so syntax errors but the idea is there

SAMPLES = ['1', '2', '3', '4']
rule svaba_run:
    input:
        ref="references/hg19.fa", 
        bam=expand("sample{sample}.bam", sample=SAMPLES)
    output:
        indels="test.svaba.indel.vcf",
    sv="test.svaba.sv.vcf"
    shell:
        "svaba run -g {input.ref} -t {input.bam}"

Right now this would just try and run the command like so

svaba run -g references/hg19.fa -t sample1.bam sample2.bam sample3.bam sample4.bam

How do I get this to run with the '-t' flag in front of each sample?


Solution

  • Since you can use regular Python code in a Snakefile, you can use that to create the string you need in a parameter by joining a list of the desired input files with the prefix you need, like so:

    SAMPLES = ['1', '2', '3', '4']
    rule svaba_run:
        input:
            ref="references/hg19.fa", 
            bam=expand("sample{sample}.bam", sample=SAMPLES)
        params:
            sample_bams = " -t ".join([f"sample{sample}.bam" for sample in SAMPLES])
        output:
            indels="test.svaba.indel.vcf",
            sv="test.svaba.sv.vcf"
        shell:
            "svaba run -g {input.ref} {params.sample_bams}"