I am trying to run a GenotypeGVCFs
on many vcf files. The command line wants every single vcf
files be listed as:
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
-R my.fasta \
-V bob.vcf \
-V smith.vcf \
-V kelly.vcf \
-o {output.out}
How to do this in snakemake? This is my code, but I do not know how to create a wildcards for -V.
workdir: "/path/to/workdir/"
SAMPLES=["bob","smith","kelly]
print (SAMPLES)
rule all:
input:
"all_genotyped.vcf"
rule genotype_GVCFs:
input:
lambda w: "-V" + expand("{sample}.vcf", sample=SAMPLES)
params:
ref="my.fasta"
output:
out="all_genotyped.vcf"
shell:
"""
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R {params.ref} {input} -o {output.out}
"""
You are putting the cart before the horse. Wildcards are needed for rule generalization: you can define a pattern for a rule where wildcards are used to define generic parts. In your example there are no patterns: everything is defined by the value of SAMPLES
. This is not a recommended way to use Snakemake; the pipeline should be defined by the filesystem: which files are present on your disk.
By the way, your code will not work, as the input
shall define the list of filenames, while in your example you are (incorrectly) trying to define the strings like "-V filename"
.
So, you have the output: "all_genotyped.vcf"
. You have the input: ["bob.vcf", "smith.vcf", "kelly.vcf"]
. You don't even need to use a lambda here, as the input doesn't depend on any wildcard. So, you have:
rule genotype_GVCFs:
input:
expand("{sample}.vcf", sample=SAMPLES)
output:
"all_genotyped.vcf"
...
Actually you don't even need input
section. If you know for sure that the files from SAMPLES
list exist, you may skip it.
The values for -V
can be defined in params:
rule genotype_GVCFs:
#input:
# expand("{sample}.vcf", sample=SAMPLES)
output:
"all_genotyped.vcf"
params:
ref = "my.fasta",
vcf = expand("-V {sample}", sample=SAMPLES)
shell:
"""
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R {params.ref} {params.vcf} -o {output}
"""
This should solve your issue, but I would advise you to rethink your solution. The use of SAMPLE
list smells. Alternatively: do you really need Snakemake if you have all dependencies defined already?