pythonsnakemakerna-seq

How to get Snakemake and CellRanger Count to work with multiple samples


I have a snakemake rule that is trying to pull from this directory called merged. This contains two separate scRNA datasets. I want to utilize snakemake in conjunction with cellranger to run any number of samples. I am getting an error below in my cellranger_count rule that I am not understanding and google isnt helping. Could someone please make this a teachable moment?

merged
SC111111-TTATTCGAGG-AGCAGGACAG_merged_R1.fastq.gz  SC222222-TGCGCGGTTT-TTTATCCTTG_merged_R1.fastq.gz
SC111111-TTATTCGAGG-AGCAGGACAG_merged_R2.fastq.gz  SC222222-TGCGCGGTTT-TTTATCCTTG_merged_R2.fastq.gz


rule cellranger_count:
    input: rules.merge_fastqs.output
    output:
        maxtrix_h5 = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix.h5',
        metrics = '{sampleID}_TenXAnalysis/outs/metrics_summary.csv',
        barcodes = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/barcodes.tsv.gz',
        features = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/features.tsv.gz',
        matrix = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/matrix.mtx.gz'
    threads: 16
    params:
        ref = 'PATH/yard/apps/refdata-gex-GRCh38-2020-A',
        #sample_id = '{sampleID}_merged'
    ## id = unique run ID string
    ## fastqs = Path to data
    ## sample = Sample names as specified in the sample sheet
    ## transcriptome = Path to Cell Ranger compatible transcritpome reference
    ## localcores = tells cellragner how many cores to use
    ## localmem = how much mem to use
    shell: """
    module load cellranger/6.1.2

    rm -rf {wildcards.sampleID}_TenXAnalysis

    cellranger count --id={wildcards.sampleID}_TenXAnalysis \
        --fastqs={input} \
        --sample={wildcards.sampleID} \
        --transcriptome={params.ref} \
        --localcores={threads} \
        --localmem=128
    """

logfile error:

error: Found argument 'merged/SC111111-TTATTCGAGG-AGCAGGACAG_merged_R2.fastq.gz' which wasn't expected, or isn't valid in this context

If you tried to supply `merged/SC111111-TTATTCGAGG-AGCAGGACAG_merged_R2.fastq.gz` as a PATTERN use `-- merged/SC111111-TTATTCGAGG-AGCAGGACAG_merged_R2.fastq.gz

Solution

  • Do you expect rules.merge_fastqs.output to be a directory or a list of fastq files?

    If I understand your post correctly, rules.merge_fastqs.output is a list of fastq files and this is passed to cellranger as a space-separated list, i.e.:

    --fastqs=merged/fastq1.fastq.gz merged/fastq2.fastq.gz merged/fastq3.fastq.gz etc
    

    However, callranger doesn't seem to support this way of passing multiple fastq files.

    So I think the issue is not so much with snakemake but with the way you execute cellranger. Try running snakemake with -p option to see what commands are actually executed and check if this is what you expect.

    If this doesn't help, post the rule merge_fastqs