bioinformaticssnakemakerna-seq

WildcardError in Snakefile


I've been trying to run the following bioinformatic script:

configfile: "config.yaml"

WORK_TRIM = config["WORK_TRIM"]
WORK_KALL = config["WORK_KALL"]

rule all:
  input: 
    expand(WORK_KALL + "quant_result_{condition}", condition=config["conditions"])


rule kallisto_quant:
    input:
      fq1 = WORK_TRIM + "{sample}_1_trim.fastq.gz",
      fq2 = WORK_TRIM + "{sample}_2_trim.fastq.gz",
      idx = WORK_KALL + "Homo_sapiens.GRCh38.cdna.all.fa.index"
    
    output:
      WORK_KALL + "quant_result_{condition}"
    
    shell:
      "kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"

However, I keep obtaing an error like this:

WildcardError in line 13 of /home/user/directory/Snakefile:
Wildcards in input files cannot be determined from output files:
'sample'

Just to explain briefly, kallisto quant will produce 3 outputs: abundance.h5, abundance.tsv and run_injo.json. Each of those files need to be sent to their own newly created condition directory. I not getting exactly what is going on wrong. I'll appreciated any help on this.


Solution

  • If you think about it, you are not giving snakemake enough information.

    Say "condition" is either "control" or "treated" with samples "C" and "T", respectively. You need to tell snakemake about the association control: C, treated: T. You could do this using functions-as-input files or lambda functions. For example:

    cond2samp = {'control': 'C', 'treated': 'T'}
    
    rule all:
      input: 
        expand("quant_result_{condition}", condition=cond2samp.keys())
    
    
    rule kallisto_quant:
        input:
          fq1 = lambda wc: "%s_1_trim.fastq.gz" % cond2samp[wc.condition],
          fq2 = lambda wc: "%s_2_trim.fastq.gz" % cond2samp[wc.condition],
          idx = "Homo_sapiens.GRCh38.cdna.all.fa.index"
        output:
          "quant_result_{condition}"
        shell:
          "kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"