I am defining two nextflow processes. The first one, scatter(), creates two files. Then, parallel() is spawned twice, once for each file.
Here is my setup.
// bug.nf
nextflow.enable.dsl = 2
workflow {
main:
scatter(params.config)
scatter.out.configs
| flatten
| parallel
}
process scatter {
container "python:3.11.8"
input:
path "config.txt"
output:
path "config*.txt", emit: configs
script:
"""
echo $PWD
ls -hal /home/alex/my_cool_repo
touch config1.txt
touch config2.txt
"""
}
process parallel {
container "python:3.11.8"
input:
path "config.txt"
script:
"""
echo $PWD
ls -hal /home/alex/my_cool_repo
"""
}
// run command
nextflow run nextflow/bug.nf --config /home/alex/my_cool_repo/my_cool_repo/config/bla.txt
The ls
output from all processes should look the same but it does not.
Output from scatter() (truncated):
/home/alex/my_cool_repo
total 656K
drwxrwxr-x 16 1035 1036 4.0K Feb 17 13:20 .
drwxr-xr-x 3 root root 4.0K Feb 17 13:20 ..
-rw-rw-r-- 1 1035 1036 3.3K Feb 17 11:09 .dockerignore
-rw-rw-r-- 1 1035 1036 3.2K Feb 6 15:33 .gitignore
drwxrwxr-x 4 1035 1036 4.0K Feb 17 13:20 .nextflow
-rw-rw-r-- 1 1035 1036 5.4K Feb 17 13:20 .nextflow.log
-rw-rw-r-- 1 1035 1036 5 Jan 26 18:18 .python-version
drwxrwxr-x 6 1035 1036 4.0K Feb 7 14:20 .venv
drwxrwxr-x 2 1035 1036 4.0K Feb 6 13:28 .vscode
-rw-rw-r-- 1 1035 1036 848 Feb 17 12:28 Dockerfile
-rw-rw-r-- 1 1035 1036 627 Feb 6 15:33 README.md
drwxrwxr-x 3 1035 1036 4.0K Feb 17 12:55 nextflow
-rw-rw-r-- 1 1035 1036 527K Feb 17 11:45 poetry.lock
-rw-rw-r-- 1 1035 1036 32 Jan 26 18:18 poetry.toml
-rw-rw-r-- 1 1035 1036 2.2K Feb 16 19:36 pyproject.toml
drwxrwxr-x 9 1035 1036 4.0K Feb 6 13:28 my_cool_repo
drwxrwxr-x 3 1035 1036 4.0K Feb 17 13:20 work
Output from the two parallel() processes:
/home/alex/my_cool_repo
total 12K
drwxr-xr-x 3 root root 4.0K Feb 17 13:20 .
drwxr-xr-x 3 root root 4.0K Feb 17 13:20 ..
drwxrwxr-x 5 1035 1036 4.0K Feb 17 13:20 work
Why are the outputs not the same?
Context: Instead of ls
I actually would like to run poetry run ...
but poetry gives the following error message for the parallel() processes: Poetry could not find a pyproject.toml file in /home/alex/my_cool_repo/work/f3/766313fbc5d6aeeb39f19193956ffd or its parents
.
As user dbthorbur points out in his comment, the difference has to do with the directories mounted into your container.
For your first process scatter
you are using an additional file-input that is located somewhere else on your machine. So nextflow needs to mount that location AND your work-directory into the container used for scatter
. Apparently it takes a common root(?) directory of both, so that you find some additional files.
The second process parallel
on the other hand only takes input from work
, so only that directory gets mounted as volume for your container.
Check out your .command.run
scripts in the work-directories to see what actually gets mounted by docker (or podman?).
There are two ways to overcome the difference.
stageInMode "copy"
as directive for scatter
to get the behaviour of parallel
in both processes
orcontainerOptions "-v /home/alex/my_cool_repo:/home/alex/my_cool_repo"
directive in parallel
to get the current behaviour of scatter
in both