NextFlow with SLURM and Singularity
- 1 1. Introduction
- 2 2. Prerequisites
- 3 3. Configuring NextFlow on LCC and MCC
- 4 4. Running a NextFlow Pipeline on LCC and MCC
- 5 5. Monitoring and Managing NextFlow Jobs
- 6 6. Troubleshooting
- 7 7. Conclusion
1. Introduction
NextFlow is a popular workflow manager designed to handle large-scale bioinformatics data processing. It provides a simple way to run and scale workflows, from a local machine to HPC clusters like LCC and MCC, which use the SLURM scheduler.
This guide explains how to configure and run NextFlow on the LCC and MCC clusters, focusing on using SLURM, setting up the environment, creating a NextFlow configuration file, and submitting jobs.
2. Prerequisites
2.1. Software Requirements
Before running NextFlow on LCC or MCC, ensure the following software is loaded:
NextFlow: Load NextFlow using the module system:
LCC:
module load ccs/nextflow
MCC:
module load ccs/conda/nextflow/24.10.4
Singularity: (If using containers) Load the Singularity module:
LCC:
module load ccs/singularity
MCC:
(Loaded by default.)
2.2. SLURM on LCC/MCC
SLURM is used as the job scheduler on LCC and MCC. It helps manage and allocate computational resources to different tasks. You'll need basic SLURM commands such as sbatch
, squeue
, and scancel
– these commands are available on all nodes without the need to load a module.
3. Configuring NextFlow on LCC and MCC
3.1. Creating a NextFlow Configuration File
Below is an example configuration file (nextflow.config
) for SLURM, explicitly configured for LCC and MCC:
LCC:
MCC:
Modify the above fields such as queue
, cpus
, memory
, and account
, depending on your job type and available resources. For MCC, you might want to use the jumbo
partition for high-memory (500 GB+) jobs.
3.2. NextFlow Profiles
Nextflow profiles allow you to specify different job configurations for various types of jobs (e.g., small, medium, large) while keeping global parameters consistent. The global process
block defines the default executor, queue, and account, while the profiles specify resource parameters like CPU, memory, and time. Each profile overrides the relevant parameters for the defined job type, making it easier to manage different workloads without changing the global configuration.
LCC:
MCC:
To run a pipeline with a specific profile, use the -profile
flag:
3.3. Using Singularity with NextFlow
Singularity is a containerization platform that allows you to package applications and their dependencies into portable containers. It is especially useful in bioinformatics workflows, where reproducibility and environment consistency are critical.
3.3.a Configuring Singularity for NextFlow
If your pipeline requires containers, you can configure NextFlow to use Singularity. This section explains how to set it up on LCC and MCC.
LCC: To use Singularity on LCC, you need to load the Singularity module:
module load ccs/singularity
MCC: On MCC, Singularity is already loaded by default, so you don't need to load it manually.
3.3.b Modifying the NextFlow Configuration for Singularity
You can specify the container engine in the nextflow.config
file under the process.container
directive. Below is an example for configuring NextFlow to use Singularity containers:
LCC:
MCC:
In this example:
The
process.container
directive tells NextFlow to use Singularity for the job.The
containerOptions
directive allows you to pass additional options to Singularity, such as mounting directories (-B
option) inside the container.
3.3.c Running a Pipeline with Singularity
To run a NextFlow pipeline that uses Singularity, you simply specify the profile as usual:
NextFlow will automatically use Singularity as the container engine for the job defined by the profile.
3.3.d Singularity Image
Ensure that the required Singularity image is available. If your workflow uses a specific container image, you can specify the image in your nextflow.config
:
If you’re using a custom image, make sure it is accessible on the cluster, either by pulling it from a registry or by placing it in a shared directory.
4. Running a NextFlow Pipeline on LCC and MCC
4.1. Using Custom Configurations
To provide additional configuration options like reports:
5. Monitoring and Managing NextFlow Jobs
Use SLURM commands to monitor jobs on LCC and MCC:
Check job status:
Cancel a job:
NextFlow also has its own logging features:
View logs:
6. Troubleshooting
6.1. Memory/CPU Issues
If jobs fail due to resource limits, adjust your nextflow.config
file. For instance, request more memory or CPUs if needed:
6.2. SLURM-Specific Errors
SLURM errors like timeouts can often be resolved by modifying job directives in your config file. Check the SLURM logs for details.
7. Conclusion
NextFlow provides efficient and scalable ways to run bioinformatics workflows on LCC and MCC with SLURM. Customize the configuration to suit your needs and consult official documentation for further details.
Center for Computational Sciences