/
NextFlow with SLURM and Singularity

NextFlow with SLURM and Singularity

1. Introduction

NextFlow is a popular workflow manager designed to handle large-scale bioinformatics data processing. It provides a simple way to run and scale workflows, from a local machine to HPC clusters like LCC and MCC, which use the SLURM scheduler.

This guide explains how to configure and run NextFlow on the LCC and MCC clusters, focusing on using SLURM, setting up the environment, creating a NextFlow configuration file, and submitting jobs.


2. Prerequisites

2.1. Software Requirements

Before running NextFlow on LCC or MCC, ensure the following software is loaded:

  • NextFlow: Load NextFlow using the module system:

    • LCC:

      module load ccs/nextflow
    • MCC:

      module load ccs/conda/nextflow/24.10.4
  • Singularity: (If using containers) Load the Singularity module:

    • LCC:

      module load ccs/singularity
    • MCC:
      (Loaded by default.)

2.2. SLURM on LCC/MCC

SLURM is used as the job scheduler on LCC and MCC. It helps manage and allocate computational resources to different tasks. You'll need basic SLURM commands such as sbatch, squeue, and scancel – these commands are available on all nodes without the need to load a module.


3. Configuring NextFlow on LCC and MCC

3.1. Creating a NextFlow Configuration File

Below is an example configuration file (nextflow.config) for SLURM, explicitly configured for LCC and MCC:

  • LCC:

  • MCC:

Modify the above fields such as queue, cpus, memory, and account, depending on your job type and available resources. For MCC, you might want to use the jumbo partition for high-memory (500 GB+) jobs.

3.2. NextFlow Profiles

Nextflow profiles allow you to specify different job configurations for various types of jobs (e.g., small, medium, large) while keeping global parameters consistent. The global process block defines the default executor, queue, and account, while the profiles specify resource parameters like CPU, memory, and time. Each profile overrides the relevant parameters for the defined job type, making it easier to manage different workloads without changing the global configuration.

  • LCC:

  • MCC:

To run a pipeline with a specific profile, use the -profile flag:

 

3.3. Using Singularity with NextFlow

Singularity is a containerization platform that allows you to package applications and their dependencies into portable containers. It is especially useful in bioinformatics workflows, where reproducibility and environment consistency are critical.

 

3.3.a Configuring Singularity for NextFlow

If your pipeline requires containers, you can configure NextFlow to use Singularity. This section explains how to set it up on LCC and MCC.

LCC: To use Singularity on LCC, you need to load the Singularity module:

module load ccs/singularity

MCC: On MCC, Singularity is already loaded by default, so you don't need to load it manually.

 

3.3.b Modifying the NextFlow Configuration for Singularity

You can specify the container engine in the nextflow.config file under the process.container directive. Below is an example for configuring NextFlow to use Singularity containers:

LCC:

MCC:

In this example:

  • The process.container directive tells NextFlow to use Singularity for the job.

  • The containerOptions directive allows you to pass additional options to Singularity, such as mounting directories (-B option) inside the container.

 

3.3.c Running a Pipeline with Singularity

To run a NextFlow pipeline that uses Singularity, you simply specify the profile as usual:

NextFlow will automatically use Singularity as the container engine for the job defined by the profile.

 

3.3.d Singularity Image

Ensure that the required Singularity image is available. If your workflow uses a specific container image, you can specify the image in your nextflow.config:

If you’re using a custom image, make sure it is accessible on the cluster, either by pulling it from a registry or by placing it in a shared directory.


4. Running a NextFlow Pipeline on LCC and MCC

4.1. Using Custom Configurations

To provide additional configuration options like reports:


5. Monitoring and Managing NextFlow Jobs

Use SLURM commands to monitor jobs on LCC and MCC:

  • Check job status:

  • Cancel a job:

NextFlow also has its own logging features:

  • View logs:


6. Troubleshooting

6.1. Memory/CPU Issues

If jobs fail due to resource limits, adjust your nextflow.config file. For instance, request more memory or CPUs if needed:

6.2. SLURM-Specific Errors

SLURM errors like timeouts can often be resolved by modifying job directives in your config file. Check the SLURM logs for details.


7. Conclusion

NextFlow provides efficient and scalable ways to run bioinformatics workflows on LCC and MCC with SLURM. Customize the configuration to suit your needs and consult official documentation for further details.

Center for Computational Sciences