Scientific simulations can often be significantly accelerated by hardware accelerators such as Graphics Processing Units (GPUs). GPUs are available on several HPC nodes.  The GPUs currently available are NVIDIA GeForce GTX1080 Ti, which is of the Pascal micro-architecture, and of compute capability 6.1. The CUDA driver version is 11.1.  The following table shows the key paramters of the GPU at the RCC:

Brand Name GTX1080 Ti
Compute Capability 6.1
Micro-Architecture Pascal
Number Stream Multi-Processors 28
Number of CUDA Cores 3584
Boost Clock 1600 MHZ
Memory Capacity 11 GB
Memory Bandwidth ~484GBs

Note about CUDA Availability

The CUDA Module, CUDA Libraries and NVIDIA CUDA Compilers are only available on the login nodes, the Spear nodes and the GPU nodes, not the compute nodes.

Compile CUDA code

To compile CUDA/C/C++ code, first load the cuda module 

$ module load cuda/11.1

The cuda compiler nvcc should be immediately available,

$ which nvcc


and you can check the cuda version via

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0


You can then compile your cuda/c/c++ code via the cuda nvcc compiler

$ nvcc -O3 -arch sm_61 -o a.out

In the above, the compiler option "-arch sm_61" specify the compute capability 6.1 for the Pascal micro-architecture.

Submit a CUDA Job

To submit a GPU job to the HPC cluster, first create a SLURM submit script similar to the following


#SBATCH -n 1
#SBATCH -J "cuda-job"
#SBATCH -t 4:00:00
#SBATCH -p backfill
#SBATCH --gres=gpu:1
#SBATCH --mail-type=ALL

# load the cuda module to set up the environment
module load cuda/11.1

# the following line should provide the full path to the cuda compiler
which nvcc

# execute your cuda executable a.out
srun -n 1 ./a.out <input.dat >output.txt

Not all computer nodes have GPU cards, and a GPU node contains up to 4 GPU cards. In order to require a compute node with GPUs,  add the following line to your submit script 

#SBATCH --gres=gpu:[1-4]    # <-- Choose between 1 and 4 GPU cards to use.

Then submit the job via

$ sbatch

Cuda Sample Code

The following cuda code example can help new users to get familar to the GPUs availalbe on the HPC cluster:

#include <stdio.h>
#include <cuda_runtime.h>

int main( ) {

    int dev = 0;
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, dev);
    printf("device id %d, name %s\n", dev,;
    printf("number of multi-processors = %d\n", 
    printf("Total constant memory: %4.2f kb\n", 
    printf("Shared memory per block: %4.2f kb\n",
    printf("Total registers per block: %d\n", 
    printf("Maximum threads per block: %d\n", 
    printf("Maximum threads per multi-processor: %d\n", 
    printf("Maximum number of warps per multi-processor %d\n",

    return 0;

Compile the code via

$ module load cuda
$ nvcc -o deviceQuery

The output will be similar to the following upon a successful run

device id 0, name GeForce GTX 1080 Ti
number of multi-processors = 28
Total constant memory: 64.00 kb
Shared memory per block: 48.00 kb
Total registers per block: 65536
Maximum threads per block: 1024
Maximum threads per multi-processor: 2048
Maximum number of warps per multi-processor 64