Frequently Asked Questions

1 How do I check my storage usage and quotas?
2 What are $HOME, $SCRATCH, $PROJECT, and $PSCRATCH directories, and what are they used for?
3 How do I manage directory permissions?
4 Can you explain the role of the login nodes on LCC and MCC?
5 SSH error - REMOTE HOST IDENTIFICATION HAS CHANGED!

How do I check my storage usage and quotas?

LCC:

Enter the following command to see the storage usage and quotas:

lcc quotas

Example output:

MCC:

Enter the following command to see the storage usage and quotas:

projects quotas

Example output:

Note: ‘Data Used’ represents the cumulative size of all files within the specified path, while ‘Max Data’ denotes the maximum storage capacity available within the particular path.

What are $HOME, $SCRATCH, $PROJECT, and $PSCRATCH directories, and what are they used for?

These directories are integral components of the file system on LCC and MCC, each serving a distinct purpose:

$HOME: This directory represents your home directory. It is a personal space allocated to each user for storing personal files, configuration settings, and executable scripts. Your home directory is accessible only to you and is often used for managing your user-specific data and settings. Allocated quota: 10GB.
$SCRATCH: The $SCRATCH directory is intended for the temporary storage of large data sets, intermediate files, and computational results. The $SCRATCH directory is only accessible to you. It offers ample storage space, with a quota much larger than the $HOME directory. Files in $SCRATCH should be short-lived and are automatically purged after 90 days since their last accessed time. Allocated quota: 25TB.
$PROJECT: The $PROJECT directory is designated for storing project-related files and data. This directory is accessible to all members of your group. It provides a shared space for collaborative research projects, allowing multiple users to access and work on project data simultaneously. The $PROJECT directory usually has a larger quota than $HOME, facilitating the storage of large datasets and project-related files. Allocated quota: 1TB.
$PSCRATCH: Similar to $SCRATCH, the $PSCRATCH directory serves as temporary storage for large-scale parallel computing tasks. However, $PSCRATCH is specifically allocated for project-related computational workloads that should be accessible to all your group members. It is ideal for storing temporary files generated during parallel computations, such as MPI jobs or distributed data processing tasks. Allocated quota: 50TB.

How do I manage directory permissions?

$PROJECT and $PSCRATCH are directories on LCC and MCC reserved for project-related files and temporary storage, respectively. They are shared amongst project group members and are essential storage locations for collaborative research. Sometimes, folders created within these project spaces do not inherit the correct permissions; you may need to fix these.

You can control directory permissions using Linux commands such as chmod and chown. These commands allow you to specify which users or groups can read, write, and execute files within these directories.

For example, say you create a directory at $PROJECT/shared-data where some shared dataset will be located:

mkdir $PROJECT/shared-data  #make directory
cd $PROJECT/shared-data     #change into directory
wget theinternet.tar.gz     #download "The Internet" into the folder

Now, let’s say you’d like to share this data with all user accounts associated with your project. This can be achieved in multiple ways, but one common approach is to set the appropriate permissions and ownership for the directory using chmod and chown commands. For instance:

# Set permissions to allow read, write, and execute for the owner and the owner's group, while everyone else has no access.
chmod 770 $PROJECT/shared-data

# Set the group ownership to the project group
chown :PILinkBlue_uksr $PROJECT/shared-data

In the example above:

“chmod 770 $PROJECT/shared-data” sets the permissions to 770, allowing the owner and members of the owner’s group to read, write, and execute within the directory and allowing no access to all other users. See more about Linux file permissions at How to Set File Permissions in Linux - GeeksforGeeks.

“chown :PILinkBlue_uksr $PROJECT/shared-data” changes the group ownership of the directory to “PILinkBlue_uksr”, ensuring that members of the project group have the appropriate access.

Remember to replace “PILinkBlue_uksr” with the actual group name associated with your project.

Additionally, if you encounter permission issues with subdirectories or files within the project spaces, you can recursively apply permissions using chmod with the -R option. For example:

# Recursively set permissions for all files and subdirectories within $PROJECT/shared-data
chmod -R 770 $PROJECT/shared-data

This command will apply the 770 permissions to all files and subdirectories within $PROJECT/shared-data, ensuring consistent access permissions throughout the directory tree.

Can you explain the role of the login nodes on LCC and MCC?

The LCC and MCC, or Lipscomb Computer Cluster and Morgan Compute Cluster, respectively, are High-Performance Computing (HPC) systems designed to provide computational power for scientific research and data analysis tasks. These clusters consist of interconnected nodes, including login nodes, data transfer nodes, and compute nodes. The login nodes are gateways for users to access the system, submit jobs, and manage tasks. Jobs submitted by users are processed by the SLURM job scheduler, which allocates resources and schedules the jobs for execution on the compute nodes. These compute nodes are equipped with high-performance processors and memory to handle computationally intensive tasks efficiently. Users interact with the system through the login nodes, utilizing tools and software available on the clusters for their research needs.

Login Node:
The login node is the entry point for users to access the HPC system. It provides a user-friendly interface for job submission, monitoring, and management. Users interact with the login node to submit jobs, manage files, and perform administrative tasks. However, the login node is not meant for executing computationally intensive tasks directly. Instead, it acts as a gateway to the compute nodes.

Data Transfer Node:
Data transfer nodes are dedicated servers within the HPC system designed to facilitate the efficient transfer of data to and from the clusters. These nodes are optimized for high-speed data transfer and provide users with a means to upload and download large datasets for analysis.

Compute Node:
Compute nodes are the workhorses of the HPC system, responsible for executing computational tasks or jobs submitted by users. These nodes are equipped with high-performance processors, large amounts of memory (RAM), and specialized accelerators (such as GPUs) to handle complex computations efficiently. Compute nodes operate independently and are interconnected via a high-speed network, allowing them to collaborate on parallel processing tasks.

SLURM:
When a user submits a job from the login node, the SLURM job scheduler assigns resources and schedules the job for execution. SLURM (Simple Linux Utility for Resource Management) is a widely used job scheduling and resource management system in HPC environments. The scheduler determines the availability of compute nodes and allocates the job to an appropriate node based on factors such as resource requirements, job priority, and system load. The job is then dispatched to the allocated compute node(s) for processing.

SSH error - REMOTE HOST IDENTIFICATION HAS CHANGED!

When connecting to the LCC or MCC, you may experience a warning message that indicates that the host identification has changed. This error message can be alarming as it includes a statement that “Someone could be eavesdropping on you right now (man-in-the-middle attack)!”.

The origin of this message stems from how LCC and MCC direct user traffic to the login nodes. For example, when you ssh into lcc.uky.edu, your connection is routed to one of six separate login nodes [login001-login006]. The server's signature is saved to your PC as the identification for the hostname lcc.uky.edu. At a later date, if you are directed to a different login node, the new signature will not match what was saved, and you get this error.

To fix this issue, we need to remove the entries for LCC and MCC from the known_hosts file and add an option to disable “StrictHostKeyChecking”.

Open the known_hosts file that should be located at C:\Users\user_id\.ssh\known_hosts, /home/user_id/.ssh/known_hosts. The file may already have some entries, as shown in the image below:

Remove all entries that include lcc.uky.edu or mcc.uky.edu.
Open the ssh config file, which is located at C:\Users\user_id\.ssh\config (this file may need to be created on Windows) or /home/user_id/.ssh/config
We need to add the following lines to the config file and save the file.

Windows Users:

Host lcc.uky.edu
  StrictHostKeyChecking no
  UserKnownHostsFile=NUL
Host mcc.uky.edu
  StrictHostKeyChecking no
  UserKnownHostsFile=NUL

MAC/Linux Users:

Host lcc.uky.edu
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null
Host mcc.uky.edu
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null

You should no longer see the message when connecting to LCC or MCC.

RCDDocs