pythonsnakemakegatk

snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem


I have written a rule for CombineGVCFs in gatk4. The rule is as follow

all_gvcf = get_all_gvcf_list()

rule cohort:
  input:
    all_gvcf_list = all_gvcf,
    ref="/data/refgenome/hg38.fa",
    interval_list = prefix+"/bedfiles/hg38.interval_list",
  params:
    extra = "--variant",
  output:
    prefix+"/vcf/cohort.g.vcf",
  shell:
    "gatk CombineGVCFs -R {input.ref} {params.extra} {input.all_gvcf_list} -O {output} --tmp-dir=/data/tmp -L {input.interval_list}"

all_gvcf is the dataset for all gvcf files which will be combined. But the problem is I need to add --variant parameter before every input. Command I'm getting right now is as follows

gatk CombineGVCFs -R /data/refgenome/hg38.fa --variant /data/prjna644607/vcf/SRR12165216_HC.g.vcf /data/prjna644607/vcf/SRR12165217_HC.g.vcf /data/prjna644607/vcf/SRR12165218_HC.g.vcf /data/prjna644607/vcf/SRR12165219_HC.g.vcf -O /data/prjna644607/vcf/cohort.g.vcf --tmp-dir=/data/tmp -L /data/prjna644607/bedfiles/hg38.interval_list

The command I want to achieve is as follows

gatk CombineGVCFs -R /data/refgenome/hg38.fa --variant /data/prjna644607/vcf/SRR12165216_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165217_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165218_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165219_HC.g.vcf -O /data/prjna644607/vcf/cohort.g.vcf --tmp-dir=/data/tmp -L /data/prjna644607/bedfiles/hg38.interval_list

How can I add this extra "--variant" tag before every input? I have added it in the get_all_gvcf_list() function. But then snakmake gives me inputfiles not found problem.


Solution

  • Found out the problem. Turns out I can write a lambda function as follows

    params:
            extra=lambda wildcards, input: ' -V '.join(input.all_gvcf_list)
    

    and add '-V' before {params.extra}. That solves the problem