nextflow

split content of csv file containing path and other fields in nextflow


I have file containing the following columns:

PathToFile, SomeNumber, SomeString

into a Channel.

How do I open such file, where PathToFile is a file or path type and the other two are val type?

I can open the file as:

    Channel.fromPath(params.list)
        .splitCsv() //{ it.trim() }
        .view(row -> file("${row[0]}"))

and it works grate! But I don't want to view it, I want to USE it in a process. Do I have to convert that to file INSIDE the first process?

Thanks! P.S. What if I want to open a TSV instead of a CSV?


Solution

  • Having example test.csv file:

    path,number,string
    file1.txt,1,abc
    file2.txt,2,bcd
    

    Not sure if I understood USE it in a process correctly. If yes, your main.nf can look like:

    #!/usr/bin/env nextflow
    nextflow.enable.dsl=2
    
    samples = Channel
            .fromPath("test.csv")
            .splitCsv(header: true)
            .map { row -> tuple(file(row.path), row.number, row.string) }
            .view()
    
    process echo_channel {
    
        debug true
        input:
        tuple file(file), val(number), val(string)
    
        script:
        """
        echo "File name: $file"
        echo "Number: $number"
        echo "String: $string"
        echo "File content:"
        cat $file
        """
    }
    
    workflow {
        echo_channel(samples)
    }
    

    I included .view() to preview the channel content (requires debug true in process definition).

    Now, running nextflow run main.nf result is:

    N E X T F L O W  ~  version 23.04.2
    Launching `main.nf` [dreamy_leibniz] DSL2 - revision: e6f30e68ff
    executor >  local (2)
    [a7/10df8f] process > echo_channel (1) [100%] 2 of 2 ✔
    [/home/art/test/nf/file1.txt, 1, abc]
    [/home/art/test/nf/file2.txt, 2, bcd]
    File name: file2.txt
    Number: 2
    String: bcd
    File content:
    This is file 2.
    File name: file1.txt
    Number: 1
    String: abc
    File content:
    This is file 1.
    

    To use tab (or other separator) instead of comma, simply change to .splitCsv(header: true, sep: '\t').