Scheduling Policies

Scheduling Policies

Scheduling Policy: OpenHPC CPUs/GPUs and OpenStack VMs

A Scheduling Policy is used to determine the order in which jobs will be run on computational resources like CPUs and GPUs. (Note: Scheduling of Virtual Machines (VMs) is discussed separately below).

Scheduling on OpenHPC CPUs and GPUs

We use the SLURM (Simple Linux Utility for Resource Management) to determine the order in which jobs will run on CPU and GPU resources. SLURM assigns each job a priority based on the Allocation (pool of compute time) it charges – the higher the priority, the sooner the job will be run.

Priority Levels

Priority Level

Allocation Purposes with this Priority

Priority Level

Allocation Purposes with this Priority

1

Condo / CCS Discretionary

2

Meritorious Research / Educational / Startup

3

Condo Incentive

4

Open Access

Allocation Priorities (priority 1 is the highest priority -- i.e., most important)

Priorities 1 and 2 are for jobs with guaranteed compute time. Priorities 3 and 4 are best-effort — these jobs only run on nodes that would otherwise sit idle, with Condo Incentive jobs taking precedence over Open Access jobs.

Ordering Within a Priority Level

Jobs at the same priority level are ordered using several factors, including:

  • How many nodes the job needs ("width")

  • How long the job has been waiting

  • Submission order

SLURM also uses backfilling to improve efficiency. If a large job is waiting for many nodes to free up, SLURM will slot in smaller jobs that can finish before those nodes are needed. For example, if Job A needs 32 nodes that won't be available for a week, SLURM may run Job B first — even though it was submitted later — if Job B only needs 4 nodes and will finish within that window.

Important: Always specify an expected runtime using --time in your sbatch command. Without it, SLURM cannot backfill effectively, and your job may wait longer than necessary.

Scheduling on OpenStack VMs

Unlike HPC jobs, VMs are typically meant to run indefinitely. This means OpenStack works differently from a scheduler like SLURM: it either creates a VM immediately if resources are available, or rejects the request outright — there's no built-in queue.

We support two ways to create VMs to accommodate different use cases:

Option 1: OpenStack Web Interface

Best when resources are plentiful and demand is low.

  • OpenStack will create the VM immediately if resources are available, or fail if they aren't.

  • There is no queue — if creation fails, you simply try again later.

  • VMs created this way have no priority, so there is no guaranteed order of creation.

Option 2: Via SLURM

Best when resources are heavily used, or when you want batch-style job behavior.

  • You submit a batch job that is queued by SLURM and automatically creates the VM via the OpenStack API when resources become available.

  • Job ordering follows SLURM's standard scheduling policy (see above).

  • The job is charged against a SLURM Account that you specify.

Prerequisite: Your project must have a SLURM Account with a VM resource allocation before you can use this method. Contact your administrator if one hasn't been set up.

 

Center for Computational Sciences