HPC Job Resource Planning

This page provides an overview of how to plan for an allocate resources for your Slurm jobs.

Overview

The Slurm resource scheduler is in-charge of allocating resources on the HPC cluster for your job.  It does not actually utilize the resources that it makes available; that is the responsibility of your program.

Things to consider

  1. How long your job needs to run - Choose this value carefully.  If you underestimate this value, the system will kill your job before it completes.  If you overestimate this value, your job may wait in-queue for longer than necessary.  Generally, it is better to overestimate than underestimate.
  2. How many compute cores and nodes your job will need - This is the level of parallelization.  Most jobs that run on the HPC take advantage of multiple processors (or cores).  You will need to instruct the Slurm scheduler how many cores your job will need, and how those cores should be distributed over CPUs and nodes.
  3. How much memory your job will need - By default, Slurm allocates 3.9GB per each CPU you allocate (except in backfill and backfill2, where the value is 1.9GB/CPU).  This value is enough more most jobs, but sometimes jobs are more memory-intensive than CPU intensive (e.g. loading large datasets into memory).  In these cases, you need to explicitly instruct Slurm to allocate extra memory.
  4. If your job needs access to special features - Your job may need access to one or more GPU nodes or a specific-model CPU.  You can specify these as constraints when you submit your job.

Nodes, processors, and cores

Nodes are physical servers in the HPC.  Each node contains multiple CPUs, and each CPU contains multiple cores.  Your job can request resources by adding parameters to your submit script.  If your submit script does not contain any specific instruction for allocating CPU resources, Slurm will allocate a single CPU on a single node.

To request more than one CPU, use the -n (or --ntasks) parameter in your submit script:

# The -n parameter means "number of tasks".
# By default, Slurm allocates a single processor per each task.

#SBATCH -n 8 

The above example illustrates a submit script that requests eight CPUs.  There is no gurantee that these CPUs will be on a single node.  Instead, Slurm will find eight processors somewhere in the cluster as efficiently as possible and allocate them.

If you need all eight processors to be on a single node, you should use the -N (or --nodes) parameter in your submit script:

# Allocate 8 processors...
#SBATCH -n 8

# ...and make sure they are all on the same node.
#SBATCH -N 1

Note, that this won't guarantee that your job has exclusive access to that node while it is running.  Other jobs may be running on that node on the same time.  Rather, it instructs Slurm that all of the resources that you requested are on a single node.  If you need to ensure that your job is the only job running on a physical node, you can use the --exclusive parameter:

# Allocate 8 processors...
#SBATCH -n 8

# ...and make sure they are all on the same node.
#SBATCH -N 1

# Also ensure that there are no other jobs running on the node
#SBATCH --exclusive

Note that requesting exclusive access to nodes will greatly increase the amount of time that your job waits in-queue, since it will take longer for entire nodes to become available.

Taking control over multi-core processors

All processors in the HPC have multiples cores.  The number of cores per processor varies across the cluster.  The lowest number (on older nodes) is 2, and some have as many as 16 cores.  You can instruct Slurm to allocate processors with a minimum number of cores if your job would benefit:

# Allocate 8 processors...
# SBATCH -n 8

# ...and ensure each processor has at-least eight cores
# SBATCH --cores-per-socket=8

Additionally, you can change the meaning of -n to mean cores (instead of entire processors) by using the --ntasks-per-core paramter.  For example, if you wish to have eight parallel processes run on a single CPU, you can do the following:

# Change the meaning of -n from 'processor' to 'core'
#SBATCH --ntasks-per-core=1

# Allocate only processors with at-least 8 cores
#SBATCH --cores-per-socket=8

# Indicate that we are going to have eight parallel tasks, which now means "8 cores" not "8 processors"
#SBATCH -n 8

In the above example, Slurm will allocate a single 8-core processor for your job.

There are many other ways to fine-tune your job submission scripts; refer to the Slurm documentation for a complete reference.

Memory resource planning

By default, Slurm pre-allocates a fixed amount of memory (or RAM) per each processor:

  • 3.9GB per processor on most partitions
  • 1.9GB per processor on backfill and backfill2

If your job needs more memory, one way to instruct slurm is to simply request more than one processor:

# One task equals one processor
#SBATCH -n 16

# Since we asked for 16 processors, we get 16x3.9GB of RAM (62.4GB)

The above example illustrates a job that will get approximately 62GB memory, because it reserves 16 processors.

Alternatively, if your job is memory-intensive, but does not have heavy parallel processing (e.g. you don't need a lot of processors), you can use the --mem parameter to ask for a specific amount of memory per node.  For example:

# Allocate a single node
#SBATCH -N 1

# Allocate 64GB of RAM on that node
#SBATCH --mem=64G


# Note: Different units can be specified using the suffix [K|M|G|T]

The --mem paramter is per-node, so if you request multiple nodes, Slurm will allocate the amount of memory you request per each node:

# Allocate two nodes
#SBATCH -N 2

# Allocate 64GB of RAM on each node.
#SBATCH --mem=64G

# Your job will be allocated two nodes, each with 64GB of RAM

If you request more memory per node than is available in the partition you are submitting to, Slurm will inform you that your job cannot be scheduled.

Why 3.9GB per processor and not 4GB?

Each node in the cluster requires a small amount of memory for overhead operations (i.e. operating system, etc).  By multiplying memory by 3.9GB instead of 4GB, we ensure that compute jobs have access to as much memory as possible with as little wasted as possible.  If compute jobs are given access to all of the RAM on a physical node, they can run the node out of memory and cause it to crash (this happens occassionally).

Time resource planning

Slurm will alllocate resources to your job for limited periods of time.  Each partition in the cluster has a different default and maximum time limit (reference).  When you submit your job, you can specify exactly how long you expect your job to run using the -t parameter.  Jobs that require less time to complete tend to start sooner than jobs that request longer times.

# This job will require two days, 10 hours, and 30 minutes to complete:
#SBATCH -t 2-10:30:00

# This job will require 30 minutes complete:
#SBATCH -t 30:00

# This job will require 32 hours to complete:
#SBATCH -t 32:00:00

# Time format is : [D-HH:MM:SS]

Slurm will kill your process when your time limit is reached, regardless of whether the job has completed or not.

Choosing a partition

You can see which partitions you have access to by running rcctool my:partitions on the HPC or logging into the management section of this website.  Each partition specifies processor, memory, and time limits.  Owner-based partitions provide priority access to research groups that have purchased time on the cluster.  General access partitions provide access to all RCC users.

More information