slurmnextflowsacct

slurm + nextflow : invalid status line: `squeue: error: Invalid user: ?`


my colleague and I both use the same slurm-based cluster. I use nextflow daily on the same server without any problem. He uses snakemake+slurm daily on the same server. Today, he tried to use a NF workflow for the first time using my config and my main.nf file.

But on his side it looks like the jobs are marked as completed, without an exit status, without a '.exit' file (the .exit file is created later, when the job has ended, see below).

Feb-14 14:32:09.928 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 6208569; id: 49; name: MAKE_MINI_BAM (PCRFree); status: COMPLETED; exit: -; error: -; workDir:path/to/nf-workdir/9b/ad1fedb4a9b1f37e07629735f35987 started: 1739539509896; exited: -; ]

Furthermore

and when we look at sacct, the job is still running (?)

$ sacct --cluster nautilus -j 6208569
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
6208569      nf-MAKE_M+   standard     thorax          2    RUNNING      0:0 
6208569.bat+      batch                thorax          2    RUNNING      0:0 
6208569.ext+     extern                thorax          2    RUNNING      0:0 

and in the .nextflow.log there is this warning: " Invalid user: ?`"

Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``
Feb-17 14:51:22.275 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:51:22.276 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``

On my side , there is no problem. what can be the source of this problem ? thanks !

PS: I don't have any specific config hidden in my home PS2: I also asked the NF slack https://nextflow.slack.com/archives/C02T98A23U7/p1739540301422199


Solution

  • in the end, that was 'just' a problem with the java instance installed alongside nextflow with conda/mamba. The NF was using the wrong local version of java. I asked my collaborator to install both softwares, to setup PATH and JAVA_HOME and everything went fine.