condasnakemake

Creating conda environment in snakemake rule


I want to create a conda environment for a tool, activate it and use the tool in a snakemake rule. I've indicated it as follows:

Snakemake rule:

rule fastqc:
    input:
        #fastq=expand("fastq_dir/{sample}_R{pair}_001.fastq.gz", sample=config["samples"], pair=config["fastq_pairs"])
        fastq_path = lambda wildcards: get_full_path(wildcards.sample)
    output:
        #html=expand("fastqc_dir/{sample}_R{pair}_001_fastqc.html", sample=config["samples"], pair=config["fastq_pairs"]),
        #zip=expand("fastqc_dir/{sample}_R{pair}_001_fastqc.zip", sample=config["samples"], pair=config["fastq_pairs"])
        os.path.join("{output_dir}", "{sample}_fastqc.html")
    params:
        outdir="{output_dir}",
        sample_adapter=os.path.join("../data/adapters", "{sample}.txt")
    log:
        log_file=os.path.join("{output_dir}", "local_log", \
        "run_FastQC_{sample}.log")
    resources:
        threads = 4,
        mem_mb = 24000,
        runtime = "2h"
    benchmark:
        os.path.join("{output_dir}", "cluster_log", "run_FastQC_{sample}.benchmark.log")
    conda:
        "envs/fastqc.yaml"
    shell:
        """
        conda activate fastqc
        fastqc {input} \
        threads {resources.threads} \
        --outdir {params.outdir} \
        --kmers 7 \
        --adapters {params.sample_adapter} \
        &> {log.log_file} 
        """

The config file is:

name: fastqc
channels:
- conda-forge
- bioconda
dependencies:
- fastqc=0.12.1-0
prefix: ./.conda_myproject/envs

When I run the snakemake, my jobs fail with a following error:

EnvironmentNameNotFound: Could not find conda environment: fastqc

Indeed, when I look to see if the environment was created in the indicated location, I don't see the fastqc environment. Instead, I see an environment with the name:

f2b1d4b45d38fce47f79239411ceb3a4_

Inside .snakemake/conda/ within my working directory.

I have tried it many times now and it fails. I install the conda environment inside the project directory rather than my home directory. I was wondering if you could help me figure this out. Thank you!


Solution

  • You don't need to activate the environment before running the command, see the examples in the tutorial, so this should work:

    rule fastqc:
        input:
            #fastq=expand("fastq_dir/{sample}_R{pair}_001.fastq.gz", sample=config["samples"], pair=config["fastq_pairs"])
            fastq_path = lambda wildcards: get_full_path(wildcards.sample)
        output:
            #html=expand("fastqc_dir/{sample}_R{pair}_001_fastqc.html", sample=config["samples"], pair=config["fastq_pairs"]),
            #zip=expand("fastqc_dir/{sample}_R{pair}_001_fastqc.zip", sample=config["samples"], pair=config["fastq_pairs"])
            os.path.join("{output_dir}", "{sample}_fastqc.html")
        params:
            outdir="{output_dir}",
            sample_adapter=os.path.join("../data/adapters", "{sample}.txt")
        log:
            log_file=os.path.join("{output_dir}", "local_log", \
            "run_FastQC_{sample}.log")
        resources:
            threads = 4,
            mem_mb = 24000,
            runtime = "2h"
        benchmark:
            os.path.join("{output_dir}", "cluster_log", "run_FastQC_{sample}.benchmark.log")
        conda:
            "envs/fastqc.yaml"
        shell:
            """
            fastqc {input} \
            threads {resources.threads} \
            --outdir {params.outdir} \
            --kmers 7 \
            --adapters {params.sample_adapter} \
            &> {log.log_file} 
            """