Monitoring SLURM job resources
CCS collects performance metrics from compute nodes and aggregates them into a Grafana dashboard.
These dashboards allow you to monitor CPU, GPU, and memory usage for your SLURM jobs to help diagnose performance issues and improve efficiency.
Step 1 — Identify Your Job ID
To list your currently running jobs on LCC or MCC, run:
squeue -u $USER -t R -o "%.18A %.8j %.8u %.2t %.20P"The first column shows the SLURM Job ID.
Important:
Using the-oflag ensures array jobs display the correct Job ID instead of the default array format.
Make a note of the Job ID.
Step 2 — Open the Grafana Monitoring Dashboard
Open a web browser and navigate to:
Step 3 — Sign in Using CILogon
Click Sign in with CILogon.
Select your identity provider
(most users choose University of Kentucky).Click Log On.
Enter your Link Blue credentials
(do not include@uky.edu).
Step 4 — Select the SLURM Job Statistics Dashboard
After logging in:
Choose the appropriate dashboard:
Compute Jobs (CPU jobs)
GPU Jobs (GPU jobs)
Step 5 — Enter Your SLURM Job ID
Locate the field labeled Slurm_Job_ID
Enter your Job ID
Press Enter
The dashboard will update to display job statistics.
Step 6 — Interpret Job Efficiency Metrics
The Job Information panel displays:
Number of nodes used
CPU efficiency
Memory efficiency
Ideal jobs typically approach:
High CPU efficiency
Memory usage close to requested allocation
Low efficiency may indicate:
over-requested resources
I/O bottlenecks
idle compute time
Step 7 — Review Node-Level Metrics
For multi-node jobs, separate metric panels appear for each node.
These panels show:
CPU utilization per core
process-level CPU usage
memory usage over time
Step 8 — Adjust the Time Range
By default, Grafana displays recent data.
To view a longer time range:
Click the time range selector in the upper-right corner
Choose a new range (for example, Last 24 hours)