Containerized Jobs

NOTE: This documentation is under heavy development and is subject to change daily.

Please note that there is no internet access from any nodes inside the DGX cluster for end users. As such, all work needs to be manually copied by the user onto and off of the cluster using scp, rsync, or sftp methods.

For containers this means that binding the appropriate directories for accessing datasets and saving output are crucial to avoid losing time and data. As such, please be sure to review the Container Directory Mounts section before you start running large-scale jobs.

Additionally, the containers that run on the DGX are unprivileged, unlike normal Docker containers. For this reason, it is important to understand the file permissions inside of containers and where files can be written. Please refer to File Permissions in DGX Containers to better understand how to work within your permissions.

PREFACE: If you are not familiar with Slurm job submissions, please review a brief primer at Submitting jobs on the DGX cluster (for first-time users).

 

Containers

Background

In this guide, we’ll review how containerized jobs are run on the DGX HPC. Users are able to run unprivileged Docker containers on the DGX HPC by leveraging Nvidia’s ENROOT sandboxing tool in conjunction with their Pyxis Slurm SPANK plugin. These two tools convert srun parameters and sbatch script directives into Docker settings (i.e. image and registry import, directory binding, etc.) that are run according to the privileges of the submitting user.

Simple Example - Running an Ubuntu Container

A very simple sbatch script to run an Ubuntu container:

#!/bin/bash #SBATCH -p defq #SBATCH --time=00:15:00 #SBATCH -A <account> #SBATCH --container-image ubuntu grep PRETTY /etc/os-release

Note that docker is never called directly on the DGX, but rather the entire sbatch script is executed inside the container as if docker has been called.

The important directive is the line #SBATCH --container-image ubuntu. This tells Pyxis and ENROOT to build an unprivileged execution environment from the ubuntu Docker image (imported from DockerHub in this instance) and execute the grep PRETTY /etc/os-release inside of the container once running.

After submitting this script using sbatch, the slurm-####.out file will provide feedback from Pyxis regarding the successful loading of the Docker image as well as the results of the commands run:

pyxis: imported docker image: ubuntu PRETTY_NAME="Ubuntu 22.04.4 LTS"

File Permissions in DGX Containers

For those familiar with running Docker containers regularly, you’ll likely be used to containers being run as privileged, meaning inside the container your are the root user and can modify the files as needed. Additionally in privileged containers you can mount most directories on the host system as the root user as well. Through ENROOT and Pyxis, containers on the DGX run as unprivileged sandboxes. This means that the container is run as the submitting user and the rights with regards to mounting directories and reading or writing files match those of the submitting user. Additionally, inside of the container, the account is that of the submitting user rather than root, which means if that user doesn’t have write access to a particular directory they will receive an error trying to create or write a file in that directory inside the container. Similarly, if a directory is mounted that the user does not have the correct permissions for, they will receive an access error when attempting to interact with that directory inside the container.

Using Nvidia GPUs in Containers

There are two steps to enabling Nvidia GPUs in containers on the DGX: supplying GPUs via Slurm, and activating the GPUs in the container itself.

Supplying GPUs via Slurm

GPUs for a Slurm task are enabled by supplying one or more of the GPU-related #SBATCH directives in your Slurm script:

GPU scheduling options: --cpus-per-gpu=n number of CPUs required per allocated GPU -G, --gpus=n count of GPUs required for the job --gpu-bind=... task to gpu binding options --gpu-freq=... frequency and voltage of GPUs --gpus-per-node=n number of GPUs required per allocated node --gpus-per-socket=n number of GPUs required per allocated socket --gpus-per-task=n number of GPUs required per spawned task --mem-per-gpu=n real memory required per allocated GPU

We’ll use --gpus-per-task as an example:

When you launch a Slurm job using this script, the Slurm scheduler will schedule one GPU and attach it to the container. It is then up to the image to make use of the GPU, which brings us to the next “step”: activating the GPUs in the container itself.

Activating the GPUs in the container itself

This is a slightly complicated topic, but the good news is that for most Nvidia and Nvidia-derived containers this is done for you so you won’t have to do anything. In a nutshell, in order for containers to see the Nvidia GPUs that have been allocated to them, they require some environmental variables to exist before the container is created. The two environmental variables are:

There are two ways to achieve this: using ENV commands in your Dockerfile if you’re building your container or using export commands from your login node shell before submitting the job to Slurm.

Adding Nvidia GPU ENV lines to your Dockerfile

This is usually already done for Nvidia images (such as CUDA or Parabricks), but if the image you want to use doesn’t have this environmental variables already, you can do so by adding the following two lines to your Dockerfile and rebuilding your image:

Using export to Set Nvidia GPU Environmental Variables

If you don’t have an easy way to rebuild or augment an existing image, you can still set this environmental variables before creating the container. This can be done by running export commands to set environmental variables in your session:

The lines add the required environmental variables and you can then run sbatch <script> and the variables will be passed to ENROOT, Pyxis, and the container. You can also add these lines to your ~/.bashrc to save from having to remember to run them for each session.

Docker Registries and Images

Docker images can be pulled from many public registries (i.e. DockerHub, Nvidia NGC, AWS/Azure/GCP, etc.) as well as from private registries with user-supplied credentials. By default DockerHub is searched for images from unspecified registries. To specify a specific registry, we must modify the --container-image line. As an example we’ll use the Nvidia Parabricks image with the following sbatch script:

The important change in this script is that we’ve added nvcr.io\# before the container name. This instructs Pyxis and ENROOT to pull the image from the Nvidia Container Catalog registry rather than DockerHub.

Private Docker Registries/Images

If the Docker image requires authorization to pull, ENROOT provides a mechanism to supply credentials from the user. This file is located at ${HOME}/.config/enroot/.credentials. If this directory does not exist, it can be created with:

The credentials file contains one or more entries that supply credentials to the various private Docker repositories that ENROOT can pull from. An example for the previously used Nvidia Container Catalog registry would be:

After adding those lines to a credential file, private containers accessible to the user who generated the API key can be pulled using ENROOT and run on the DGX cluster. Further examples can be found on ENROOT's import documentation.

SquashFS Container Image Files

While it is convenient to pull images from public and private repositories for working with containerized jobs, some images can be quite large and pulling such images repeatedly can slow down execution for large numbers of jobs that might use the same image. To alleviate this, ENROOT allows for the use of SquashFS images to be pointed to in place of Docker repository images. Taking the above Parabricks, we can modify the script to use a SquashFS file to load the Parabricks image:

As you can see, the only change we’ve made is to replace the repository path nvcr.io\#nvidia/clara/clara-parabricks:4.2.1-1 with the local SquashFS path /cm/shared/images/parabricks/4.2.1/parabricks-4.2.1-1.sqfs. By doing this, ENROOT will create the container from the locally stored image and prevent the need to pull and repull the image from the internet.

Creating a SquashFS image file

The easiest way to create a SquashFS file is to add the #SBATCH --container-save=/path/to/image.sqfs directive to your job script. By doing this, ENROOT will download the image from the network and save it to the supplied path to be used in future job scripts. A simple script to save our Parabricks image would be:

Make sure the --container-save path points to a file and not a directory as it will delete the entire contents of the directory!

After your Slurm job is submitted and run, the file will be created in the path and can be used in place of the registry-based --container-image for future Slurm jobs.

You can also create SquashFS image files locally to upload to the DGX cluster through a more involved process. You’ll need to install Docker, squashfs-tools, and any compression schemes you plan to use for your SquashFS image file. Once you’ve created your image, you can use the following script to create a .sqfs file from a local Docker container (credit to a gist from hasbrowncipher on GitHub):

 

Call the script via ./<script> docker_image_name:tag squashfs_file_name and upload the created .sqfs file to the DGX to run as shown above.

Container Directory Mounts

By default the submitting user’s ${HOME} directory is mounted into a Slurm-executed container, but any existing directory that a user has POSIX access rights to can also be mounted inside the container as well. This can be accomplished in two ways: using #SBATCH directives, and using ENROOT configuration files. The differences are described below.

Using #SBATCH directives

When using #SBATCH directives to define container mounts, the directive must be added to each script that requires the directory(ies) to be mounted. This allows specific scripts to be customized for specific mounts if required.

Using our Ubuntu example script above, we can modify the script to mount a user’s /scratch directory into the container as well using the --container-mounts directive. The updated script would look like:

As you can see we’ve added the directive #SBATCH --container-mounts=/scratch/<linkblue>:/scratch/<linkblue> which mounts the filesystem path /scratch/<linkblue> to the container-internal path /scratch/<linkblue>, similar to how Docker container mount descriptors work. Multiple mount paths can be supplied by providing them , separated.

A real-world example using our Parabricks container

Now let’s look at a more detailed example using our Parabricks container. For Parabricks you need to supply a reference genome, meaning you’ll need to link to a directory to allow the container to view those files. To do so, we’ll use the --container-mounts directive to mount the example dataset supplied by Nvidia for testing Parabricks and run the fastq.gz to bam conversion routine, saving the results to our home directory:

Note: This Slurm script can be found in the UKY script examples directory on the DGX cluster at: /cm/shared/examples/uky/genomics/parabricks/4.2.1-1/run_fq2bam.sh

As you can see from the example above, we’ve mounted the example Parabricks dataset located at /cm/shared/examples/uky/genomics/parabricks/parabricks_sample inside the container so that it can be used in the processing of fastq files. Also note we’ve used the ro flag at the end of the mount, which specifies the directory is read-only to the container to prevent corrupting the data. Also note that we’re writing files to our /home/${USER} directory, which is automatically mounted for containers and thus ready to save our results to.

Using .config/enroot/mounts.d/ files

When using ENROOT configuration files to define mounts, the file is created once and it is supplied to all containerized jobs submitted from that users' account. As such, this should be used for very general mounts (e.g. the users' /scratch directory) to prevent needing to define the same mount for all (or nearly all) containerized jobs.

ENROOT mount configuration files can be found in the users' ${HOME}/.config/enroot/mounts.d/ directory. If this directory does not exist, it can be created with:

Inside the directory .fstab files are used to define mount(s) that should be applied to all containerized jobs. For example, if a user would like their /scratch directory to be mounted inside every container using the same filepath, the following scratch.fstab file can be added to their ~/.config/enroot/mounts.d directory:

Notice the mount files use the fstab style declarations. More information about ENROOT user configuration can be found on ENROOT’s configuration documentation page.

Center for Computational Sciences