nextflow

how to chain Channel.fromFilePairs from one process to another process


I have two nextflow scripts that work independently, how do I combine them into one such that output files from process TRIM can be used as input files for process BWA. The process BWA must not start until paired trimmed files have been created.

Both scripts use paired files (for each mate of paired end sequencing), so not sure how to use Channel.fromFilePairs for chaining the two scripts into one.

nextflow script 1 this script trims reads and stores the output in trimmed/

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process TRIM {
  debug true

  publishDir 'trimmed/', mode: 'copy', overwrite: false, pattern: "*"

  input:
  tuple val(sampleId), file(reads)
  
  output:
  path "*"

  script:
  """
  trim_galore --paired -q 30 --length 30 --fastqc --phred33 $reads
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/subset_fq/*_{1,2}.fq.gz", checkIfExists:true) \
    | TRIM
}

nextflow script 2 this script uses trimmed reads from trimmed/ and outputs sam files in bwa_aligned/

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process BWA {
  debug true

  publishDir 'bwa_aligned/', mode: 'copy', overwrite: false, pattern: "*"

  input:
  tuple val(sampleId), file(reads)

  output:
  path "*"

  script:
  """
  bwa mem /exports/eddie/scratch/pdewari/hamour/genome/fEpiCoi_cnag1_curated_primary.no_mt.fa $reads -t 2 > ${sampleId}.sam
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/trimmed/*_{1,2}_val_{1,2}.fq.gz", checkIfExists:true) | BWA
}


Solution

  • The nextflow documentation shows lots of good examples and is very thorough, and I highly recommend checking that out whenever you get stuck and for best practices.

    The workflow invocation is how you can send files from one process to another in DSL2 nextflow. But you need to remove the workflow invocations from script 1 and 2, and then create a workflow script (you can also put everything into one script, but I find it's easier to modify and add if its modular).

    Script 1 becomes this:

    #!/usr/bin/env nextflow
    
    process TRIM {
      debug true
      tag "$sampleId"
    
      publishDir 'trimmed/', mode: 'copy', overwrite: false
    
      input:
      tuple val(sampleId), path(reads)
      
      output:
      tuple val(sampleId), path(*_val_1.fq.gz), path(*_val_2.fq.gz), emit: trimmed_seqs
      // you can add an output channel for fastqc here, as it doesn't need to be in the same channel as the reads. 
    
      script:
      """
      trim_galore --paired -q 30 --length 30 --fastqc --phred33 $reads
      """
    }
    

    I've made some changes following the documentation and to make it easier to follow. Firstly, the file qualifier is deprecated and path is preferred. I've also added the tag so you can see which process is running on each sample. And finally, the output is a tuple, that mimics the input structure. Also, using * wildcard is dangerous as you're creating an output channel with each and every file in the working directory, not just the ones you want.

    Script 2 becomes:

    #!/usr/bin/env nextflow
    
    process BWA {
      tag "$sampleId"
      debug true
    
      publishDir 'bwa_aligned/', mode: 'copy', overwrite: false, pattern: "*"
    
      input:
      // your original input declaration didn't match the cardinality of your TRIM process output. 
      tuple val(sampleId), path(r1), path(r2)
    
      output:
      path "*" // I would update this to be less promiscuous
    
      script:
      """
      bwa mem /exports/eddie/scratch/pdewari/hamour/genome/fEpiCoi_cnag1_curated_primary.no_mt.fa $reads -t 2 > ${sampleId}.sam
      """
    }
    

    It's also not best practice to give nextflow absolute paths. The fEpiCoi_cnag1_curated_primary.no_mt.fa file should be staged in each processing environment as a value channel.

    And then finally, the workflow script.

    #!/usr/bin/env nextflow
    nextflow.enable.dsl=2
    
    // These paths are relative to the workflow .nf file
    include { TRIM } from './modules/trim.nf'
    include { BWA } from './modules/bwa.nf'
    
    Channel
      .fromFilePairs("$baseDir/subset_fq/*_{1,2}.fq.gz", checkIfExists:true)
      .set{ paired_reads }
    
    workflow Mapping {
      main:
        TRIM(paired_reads)
        BWA(TRIM.out.trimmed_seqs)
    }
    

    NB. None of these scripts have been tested.