In the second rule I would like to select from the vcf file containing bob, clara and tim, only the first genotype of dictionary (i.e. bob) in roder to get as output in the second rule bob.dn.vcf
. Is this possible in snakemake
?
d = {"FAM1": ["bob.bam", "clara.bam", "tim.bam"]}
FAMILIES = list(d)
rule all:
input:
expand some outputs
wildcard_constraints:
family = "|".join(FAMILIES)
rule somerulename:
input:
lambda w: d[w.family]
output:
vcf="{family}/{family}.vcf"
shell:
"""
some python command line which produces a single vcf file with bob, clara and tim
"""
rule somerulename:
input:
invcf="{family}/{family}.vcf"
params:
ref="someref.fasta"
output:
out="{family}/{bob}.dn.vcf"
shell:
"""
gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -O {output.out}
"""
There are at least two options:
rule somerulename:
output:
out="FAM1/bob.dn.vcf"
rule somerulename:
output:
out="{family}/{bob}.dn.vcf"
wildcard_constraints:
family="FAM1",
bob="bob",
all
:rule all:
input: "FAM1/bob.dn.vcf", "FAM2/alice.dn.vcf"