bioinformaticssnakemakerna-seq

fastqc and trimming for rnseq data


could anyone suggest what is wrong in this snakefile code; I am trying to learn snakemake so could you please suggest any useful resources to read more about snakmake.I will be thankful for all your help.I already mentioned the all rule and also the inputs using wildcards.

configfile: "config.json"
  
SAMPLES = config["samples"]

Define the expected outputs

fastqc_outputs = expand(f"{config['data']}/fastqc/{{sample}}_1_fastqc.html", sample=SAMPLES) + \
                  expand(f"{config['data']}/fastqc/{{sample}}_1_fastqc.zip", sample=SAMPLES) + \
                  expand(f"{config['data']}/fastqc/{{sample}}_2_fastqc.html", sample=SAMPLES) + \
                  expand(f"{config['data']}/fastqc/{{sample}}_2_fastqc.zip", sample=SAMPLES)

trimmed_outputs = expand(f"{config['data']}/trimmed/{{sample}}_1P.fastq.gz", sample=SAMPLES) + \
                  expand(f"{config['data']}/trimmed/{{sample}}_2P.fastq.gz", sample=SAMPLES)

trimmed_fastqc_outputs = expand(f"{config['data']}/trimmed_fastqc/{{sample}}_1_fastqc.html", sample=SAMPLES) + \
                  expand(f"{config['data']}/trimmed_fastqc/{{sample}}_1_fastqc.zip", sample=SAMPLES) + \
                  expand(f"{config['data']}/trimmed_fastqc/{{sample}}_2_fastqc.html", sample=SAMPLES) + \
                  expand(f"{config['data']}/trimmed_fastqc/{{sample}}_2_fastqc.zip", sample=SAMPLES)
Rule all, specifying the final target files
rule all:
    input:
        fastqc_outputs,
        trimmed_outputs,
        trimmed_fastqc_outputs

rule fastqc:
    input:
        f"{config['data']}/{{sample}}_1.fastq.gz",
        f"{config['data']}/{{sample}}_2.fastq.gz"
    output:
        f"{config['data']}/fastqc/{{sample}}_1_fastqc.html",
        f"{config['data']}/fastqc/{{sample}}_1_fastqc.zip",
        f"{config['data']}/fastqc/{{sample}}_2_fastqc.html",
        f"{config['data']}/fastqc/{{sample}}_2_fastqc.zip"
    params:
       threads = 4  # Adjust based on your system
    shell:
        fastqc -t {params.threads} {input}

rule trimmomatic:
    input:
        f"{config['data']}/{{sample}}_1.fastq.gz",
        f"{config['data']}/{{sample}}_2.fastq.gz"
    output:
        f"{config['data']}/trimmed/{{sample}}_1P.fastq.gz",
        f"{config['data']}/trimmed/{{sample}}_1U.fastq.gz",
        f"{config['data']}/trimmed/{{sample}}_2P.fastq.gz",
        f"{config['data']}/trimmed/{{sample}}_2U.fastq.gz"
    params:
        threads = 4,
        adapters = "contams_forward_rev.fa",  # Update with actual path
        minlen = 36,  # Adjust based on your needs
        leading = 3,
        trailing = 3,
        slidingwindow = "4:20"
    shell:

        trimmomatic PE -threads {params.threads} \
        {input[0]} {input[1]} \
        {output[0]} {output[1]} {output[2]} {output[3]} \
        ILLUMINACLIP:{params.adapters}:2:30:10 LEADING:{params.leading} TRAILING:{params.trailing} \
        SLIDINGWINDOW:{params.slidingwindow} MINLEN:{params.minlen}'


rule fastqc_trimmed:
    input:
        f"{config['data']}/trimmed/{{sample}}_1P.fastq.gz",
        f"{config['data']}/trimmed/{{sample}}_2P.fastq.gz"
    output:
        f"{config['data']}/trimmed_fastqc/{{sample}}_1_fastqc.html",
        f"{config['data']}/trimmed_fastqc/{{sample}}_1_fastqc.zip",
        f"{config['data']}/trimmed_fastqc/{{sample}}_2_fastqc.html",
        f"{config['data']}/trimmed_fastqc/{{sample}}_2_fastqc.zip"
    params:
        threads = 4  # Adjust based on your system
    shell:
       fastqc -t {params.threads} {input}

configfile:

"data": "/mnt/scratch2/users/fastq_files/fastq/snakemake",
    "genome": "/mnt/scratch2/users/fastq_files/fastq/snakemake/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz",
    "samples": ["SRR10203569", "SRR10203570", "SRR10203571", "SRR10203572", "SRR10203573", "SRR10203574", "SRR10203575"]
}

Error;

SyntaxError in line 38 of /mnt/scratch2/users/akgw/fastq_files/fastq/snakemake/Snakefile:
invalid syntax 

and this is the 38 line in script;

fastqc -t {params.threads} {input}

Solution

  • You need to enclose the shell command in quotation marks. Additionally, Snakemake has a built-in threads directive, so it's better to use that instead of defining it in params. Here’s the corrected version:

    rule fastqc:
        input:
            f"{config['data']}/{{sample}}_1.fastq.gz",
            f"{config['data']}/{{sample}}_2.fastq.gz"
        output:
            f"{config['data']}/fastqc/{{sample}}_1_fastqc.html",
            f"{config['data']}/fastqc/{{sample}}_1_fastqc.zip",
            f"{config['data']}/fastqc/{{sample}}_2_fastqc.html",
            f"{config['data']}/fastqc/{{sample}}_2_fastqc.zip"
        threads: 4
        shell:
            "fastqc -t {threads} {input}"