bashfor-loopsplitnextflow

Nextlow/bash scripting help! How to use for loop in bash script inside of a nexflow process?


I am here since I did not across any solution on the internet, yet. I am trying to write a nexflow workflow that basically splits a big table, computes statistics for each split table, and then merges small stats table.

I have some trouble with splitting table process. I want to split table while keeping the header intact in smaller ones. Code for the bash is something like this:

  head -n1 '${table2parse}' > header.tsv ## take the header line
  tail -n+2 '${table2parse}' | split -l 4 - chunk_ ## split the table w/o headers
  for f in chunk_*; do cat header.tsv $f > 'split_table_$f.tsv'; done ## add the header to each chunk

So far this works. However, when I tried to incorporate this into nextflow pipeline:

process splitTable {
  input:
    path table2parse

  output:
    path 'split_table_*'

  """
  head -n1 '${table2parse}' > header.tsv ## take the header line
  tail -n+2 '${table2parse}' | split -l 4 - chunk_ ## split the table w/o headers
  for f in chunk_*; do cat header.tsv $f > 'split_table_$f.tsv'; done ## add the header to each chunk
  """
}

I get this error:

Caused by:
  No such variable: f -- Check script 'trial.nf' at line: 16

Apparently nextflow confuses bash variable with its own variables. I tried to use escape character '\f' , establishing it as a nextflow variable, but to no avail.

Therefore I am really grateful to anyone with suggestions.

PS: I recently try to learn dsl2 syntax of the Nextflow, if you have recommendations on that, I am all ears!


Solution

  • Reduce the problem to a Bash script taking the required parameter. Test it independently Nextflow process, then call the script from the Nextflow process