I have written a rule for CombineGVCFs in gatk4. The rule is as follow
all_gvcf = get_all_gvcf_list()
rule cohort:
input:
all_gvcf_list = all_gvcf,
ref="/data/refgenome/hg38.fa",
interval_list = prefix+"/bedfiles/hg38.interval_list",
params:
extra = "--variant",
output:
prefix+"/vcf/cohort.g.vcf",
shell:
"gatk CombineGVCFs -R {input.ref} {params.extra} {input.all_gvcf_list} -O {output} --tmp-dir=/data/tmp -L {input.interval_list}"
all_gvcf is the dataset for all gvcf files which will be combined. But the problem is I need to add --variant parameter before every input. Command I'm getting right now is as follows
gatk CombineGVCFs -R /data/refgenome/hg38.fa --variant /data/prjna644607/vcf/SRR12165216_HC.g.vcf /data/prjna644607/vcf/SRR12165217_HC.g.vcf /data/prjna644607/vcf/SRR12165218_HC.g.vcf /data/prjna644607/vcf/SRR12165219_HC.g.vcf -O /data/prjna644607/vcf/cohort.g.vcf --tmp-dir=/data/tmp -L /data/prjna644607/bedfiles/hg38.interval_list
The command I want to achieve is as follows
gatk CombineGVCFs -R /data/refgenome/hg38.fa --variant /data/prjna644607/vcf/SRR12165216_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165217_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165218_HC.g.vcf --variant /data/prjna644607/vcf/SRR12165219_HC.g.vcf -O /data/prjna644607/vcf/cohort.g.vcf --tmp-dir=/data/tmp -L /data/prjna644607/bedfiles/hg38.interval_list
How can I add this extra "--variant" tag before every input? I have added it in the get_all_gvcf_list() function. But then snakmake gives me inputfiles not found problem.
Found out the problem. Turns out I can write a lambda function as follows
params:
extra=lambda wildcards, input: ' -V '.join(input.all_gvcf_list)
and add '-V' before {params.extra}. That solves the problem