I have 100 files, and I want to parallelise my submission to save time instead of running jobs one by one. How can I change this script to a Job-array in LSF
using bsub
submission system and run 10 jobs at every time?
#BSUB -J ExampleJob1 #Set the job name to "ExampleJob1"
#BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 2:00 #Set the wall clock limit to 2hr
#BSUB -n 1 #Request 1 core
#BSUB -R "span[ptile=1]" #Request 1 core per node.
#BSUB -R "rusage[mem=5000]" #Request 5000MB per process (CPU) for the job
#BSUB -M 5000 #Set the per process enforceable memory limit to 5000MB.
#BSUB -o Example1Out.%J #Send stdout and stderr to "Example1Out.[jobID]"
path=./home/
for each in *.bam
do
samtools coverage ${each} -o ${each}_coverage.txt
done
Thank you for your time; any help is appreciated. I am a starter at LSF
and am quite confused.
You tagged your question with nextflow, so I will provide a minimal (untested) solution using Nextflow by enabling the LSF executor. By using Nextflow, we can abstract away the underlying job submission system and focus on writing the pipeline however trivial. I think this approach is preferable, but it does place a dependency on Nextflow. I think it's a small one and maybe it's overkill for your current requirements, but Nextflow comes with other benefits, like being able to modify and resume when those requirements inevitably change.
Contents of main.nf
:
params.bam_files = './path/to/bam_files/*.bam'
params.publish_dir = './results'
process samtools_coverage {
tag { bam.baseName }
publishDir "${params.publish_dir}/samtools/coverage", mode: 'copy'
cpus 1
memory 5.GB
time 2.h
input:
path bam
output:
path "${bam.baseName}_coverage.txt"
"""
samtools coverage \\
-o "${bam.baseName}_coverage.txt" \\
"${bam}"
"""
}
workflow {
bam_files = Channel.fromPath( params.bam_files )
samtools_coverage( bam_files )
}
Contents of nextflow.config
:
process {
executor = 'lsf'
}
Run using:
nextflow run main.nf
Note also:
LSF supports both per-core and per-job memory limit. Nextflow assumes that LSF works in the per-core memory limits mode, thus it divides the requested memory by the number of requested cpus.
This is not required when LSF is configured to work in per-job memory limit mode. You will need to specified that adding the option
perJobMemLimit
in Scope executor in the Nextflow configuration file.