Monitoring a node's resources

Monitoring a node's resources

CCS collects performance metrics from compute nodes and aggregates them into a Grafana dashboard for visualization. If a job is experiencing performance issues, you can use this dashboard to monitor resource usage such as CPU and memory.



Step 1 — Identify the Node Running Your Job

To list your currently running jobs on LCC or MCC, run:

squeue -u $USER -t R

The final column of the output shows the node or nodes where each job is running.

Make a note of the node name (for example, rome001).


Step 2 — Open the Grafana Dashboard

Open a web browser and navigate to the CCS monitoring dashboard: https://monitor.ccs.uky.edu


Step 3 — Sign in Using CILogon

  1. Click Sign in with CILogon.

  2. Select your identity provider (most users choose University of Kentucky).

    CILogon page showing the Select an Identity Provider screen with University of Kentucky selected.
    CILogon page showing the Select an Identity Provider screen with University of Kentucky selected.



  3. Click Logon.

  4. Sign in using your Link Blue credentials (do not include @uky.edu).

 

Step 4 — Select the Cluster Dashboard

After logging in, you will see the Cluster Dashboards page.

Choose one of the following dashboards based on what you want to monitor:

  • Node Metrics — monitor CPU, memory, and load on a compute node

  • SLURM Job Stats — monitor metrics for a specific SLURM job

  1. Select the dashboard under the cluster where your job is running (LCC or MCC).

Grafana Cluster Dashboards page showing MCC and LCC dashboard options.
Grafana Cluster Dashboards page showing MCC and LCC dashboard options.



  1. The default Node Metrics dashboard will load.

 

Step 5 — View Metrics

If you selected the Node Metrics dashboard

  1. Locate the Host selector.

  2. Search for or scroll to find your compute node (for example, rome001).

  3. Select the node.

The dashboard will update to display metrics for that node.

Grafana Node Metrics dashboard showing the Host dropdown selector with compute nodes listed and one node selected.
Grafana Node Metrics dashboard showing the Host dropdown selector with compute nodes listed and one node selected.

 

 

If you selected the SLURM Job Stats dashboard

Follow the Monitoring SLURM job resources documentation.

 

Step 6 — Adjust the Time Range

By default, Grafana displays data from the last 24 hours.

To change the time range:

  1. Click the Last 24 hours control in the upper-right corner.

  2. Select the desired time range (for example, Last 7 days).

 

Step 7 — View Metrics and Investigate Issues

Metrics are organized into expandable panels.

  • Scroll through panels to find CPU, memory, or other metrics.

  • To zoom in on a specific region of a graph:

    1. Click and drag across the area of interest.

    2. Release the mouse button.

Grafana will refresh the view with a more detailed time window.

Center for Computational Sciences