ubuntubioinformaticssnakemake

fastq-dump.3.1.1 err: item not found while constructing within virtual database module - the path 'SRRXXXXX.sra' cannot be opened as database or table


SRRs = [
      "SRR11111111",
      "SRR22222222",
      "SRR33333333",
      "SRR44444444",
      "SRR55555555",
      "SRR66666666",
      "SRR77777777"
      ]


rule all:
  input:
    expand("../atac_seq/{srr}/{srr}_1.fastq.gz", srr=SRRs),
    expand("../atac_seq/{srr}/{srr}_2.fastq.gz", srr=SRRs),
    expand("../atac_seq/{srr}/{srr}_3.fastq.gz", srr=SRRs),
    expand("../atac_seq/{srr}/{srr}_4.fastq.gz", srr=SRRs)

rule fastq_dump:
  output:        
    "../atac_seq/{n}/{n}_1.fastq.gz",
    "../atac_seq/{n}/{n}_2.fastq.gz",
    "../atac_seq/{n}/{n}_3.fastq.gz",
    "../atac_seq/{n}/{n}_4.fastq.gz"
  shell:
    """
    fastq-dump --split-files --gzip {wildcards.n}.sra 
    """

Hi all, right now in the directory /atac_seq/{srr}/ there is only a .sra file. I want to expand the .sra file using fastqdump and snakemake, but I get the error: "fastq-dump.3.1.1 err: item not found while constructing within virtual database module - the path 'SRRXXXXX.sra' cannot be opened as database or table".

I've tried using fastq-dump in bash in the individual folders, and it works, but I'm not sure why my script doesn't work.


Solution

  • Generally, when debugging these things - try running the Snakemake workflow with the -p and --dry-run options. Then take the exact command that Snakemake prints and run it in the current directory to see if that works.

    Specifically, I think in your shell command you need:

    shell:
        """
        cd ../atac_seq/{wildcards.n}
        fastq-dump --split-files --gzip {wildcards.n}.sra 
        """
    

    Snakemake runs all the shell commands in sub-shells, so there's no need to change the directory back. The cd command will only affect the commands within this shell block.