HPC

High Performance Computing Cluster documentation

GPUs

The HPC now supports GPU jobs.  The HPC includes several processing nodes with NVIDIA GeForce GTX 1080 Ti GPU cards.  Each GPU node contains four GPUs.

GPU nodes are available on the backfill and backfill2 partitions. In addition, if your department, lab, or group has purchased GPU resources, they will be available in your owner-based partition.

HPC Job Resource Planning

This page provides an overview of how to plan for an allocate resources for your Slurm jobs.

Overview

The Slurm resource scheduler is in-charge of allocating resources on the HPC cluster for your job.  It does not actually utilize the resources that it makes available; that is the responsibility of your program.

HPC Partition List

When you submit a job to the HPC, you must submit it to a partition.  It will then wait in-queue until resources are avaiable on the system to run the job.

Several partitions are General Access, which means anybody with an RCC account can submit to them.  Other partitions are owner-based.  These are available only to account-holders who are in groups that have purchased dedicated resources on the system.

See the HPC partition list.

 

Troubleshooting HPC Jobs

This page lists common questions and isseus that users experience when using the HPC Cluster.

Network Limitations

Note that HPC compute nodes are unable to access data from outside of the private network.  Once your job begins, it can access only data stored on our GPFS or Archival systems.  This means that if you need to transfer data from an outside server, it needs to be done before your job is submitted to Slurm.

HPC Login nodes and the export server can access data from anywhere on the Internet.

Submitting HPC Jobs

This page describes how to submit a job to the High Performance Computing Cluster.

Overview

Anytime you wish to use the HPC, you must create a "job", and submit that job to one of our processing partitions. A partition represents a subset of our overall compute cluster that can run jobs. We use advanced scheduling software called Slurm to manage jobs and partitions. This software enables us to provide limited compute resources to a large campus community.