Job Resource Planning

The Slurm resource scheduler is in-charge of allocating resources on the HPC cluster for your HPC jobs. It does not actually utilize the resources (that is the responsibility of your job); rather, it just reserves them.

Things to consider#

How long your job needs to run - Choose this value carefully. If you underestimate this value, the scheduler will kill your job before it completes. If you overestimate this value too much, your job may wait in-queue for longer than necessary. Generally, it is better to overestimate than underestimate.
How many compute cores and nodes your job will need - This is the level of parallelization. Most jobs the run on the HPC take advantage of multiple processors/cores. You will need to instruct the Slurm scheduler regarding how many cores your job will need, and how those cores should be distributed over physical processors and compute nodes.
How much memory your job will need - By default, the scheduler allocates 3.9GB of RAM per each CPU you allocate¹. This value is enough for most jobs, but sometimes jobs are more memory-intensive than CPU intensive (e.g. loading large datasets into memory). In these cases, you need to explicitly instruct Slurm to allocate extra memory.
If your job needs access to special hardware or features - Your job may need access to one or more GPU nodes or a specific-model CPU. You can specify these as constraints when you submit your job.

Nodes, processors, and cores#

Nodes are physical servers in the HPC cluster. Each node contains multiple CPUs, and each CPU contains multiple cores. Your job can request resources by adding parameters to your submit script. If your submit script does not contain any specific instructions for allocating CPU resources, Slurm will allocate a single CPU on a single node.

To request more than one CPU, use the -n (or --ntasks) parameter in your submit script:

# The -n parameter means "number of tasks"
# By default, Slurm allocates a single process per each task.

#SBATCH -n 8 # <-- Request 8 CPUs

The above example is a script that requests eight CPUs. There is no guarantee that these CPUs will be allocated on a single node. Instead, Slurm will find eight processors somewhere in the cluster as efficiently as possible and allocate them.

If you need all eight processors to be on a single node, you should use the -N (or --nodes) parameter in your submit script:

1 2	`#SBATCH -n 8 # Allocate 8 processors... #SBATCH -N 1 # ...and make sure they are all on the same node.`

Note that this won't guarantee that your job has exclusive access to the node while it is running. Other jobs may be running on that node concurrently while your job is running. Rather, it instructs Slurm that all the resources you request should be allocated on a single node.

Reserving entire nodes for your job#

If you need to ensure that your job is the only job running on a node, Slurm provides the --exclusive parameter:

1
2
3

#SBATCH -n 8          # Allocate 8 processors...
#SBATCH -N 1          # ...and make sure they are all on the same node.
#SBATCH --exclusive   # Also ensure that there are no other jobs running on the node

Warning

Requesting exclusive access for your jobs will greatly increase the amount of time that your job will wait in-queue since it takes longer for entire nodes to become available.

Taking control over multi-core processors#

All nodes in the HPC have multicore processors. The number of cores per physical processor varies across the cluster. The lowest number (on older nodes) is 2 and some have as many as 40 cores. You can instruct Slurm to allocate processors with a minimum number Slurm to allocate processors with a minimum number of cores if your job will benefit:

1 2	`#SBATCH -n 8 # Allocate 8 processors... #SBATCH --cores-per-socket=8 # ...and ensure each processor has at-least eight cores`

The above code will instruct Slurm to select only nodes with a minimum of 8 cores per processor for your job.

Additionally, you can change the meaning of -n to mean cores (instead of entire processors) by using the --ntasks-per-core parameter. For example, if you wish to have eight parallel processes run on a single CPU, you can do the following:

1
2
3

#SBATCH --ntasks-per-core=1   # Change the meaning of -n from 'processor' to 'core'
#SBATCH --cores-per-socket=8  # Allocate on nodes with at-least 8 core processors
#SBATCH --ntasks=8            # Indicate that we are going to request eight tasks, which now means "8 cores" instead of "8 processors"

In the above example, Slurm allocate a single 8-core processor for your job.

There are many other ways to fine-tune your job submission scripts; refer to the Slurm documentation for a complete reference.

Memory resource planning#

By default, Slurm automatically allocates a fixed amount of memory (or RAM) for each processor:

3.9GB per processor in most Slurm Accounts
1.9GB per processor in the backfill and backfill2 Slurm Accounts

If your job needs more memory, one way to ensure this is to instruct Slurm to request more than one processor:

1	`#SBATCH -n 16 # One task equals one processor`

Since you asked for 16 processors in the above example, our job will be allocated 16 × 3.9GB = 62.4GB RAM.

Alternatively, if your job is memory-intensive, but does not have heavy parallel processing (i.e., you do not need many CPU resources), you can use the --mem parameter to ask for a specific amount of memory per node. For example:

1 2	`#SBATCH -N 1 # Ensure that the job is limited to a single node #SBATCH --mem=64G # Ensure 64GB of RAM is allocated on that node`

Tip

Different units can be specified by using the suffixes:

K = kilobytes
M = megabytes
G = gigabytes
T = terabytes

Warning

If you do not specify a suffix, Slurm defaults to using M (megabytes)

The --mem parameter specifies the amount of memory your job needs per node. So, if you request multiple nodes, Slurm will allocate the amount of memory you request per each node:

1 2	`#SBATCH -N 2 # Allocate two nodes #SBATCH --mem=64G # Allocate 64GB of RAM on each node`

In the above example, your job will run on a total of 2 nodes × 64GB RAM = 128GB RAM. Note, because you did not specify the -n parameter, your job will be allocated a single processor on each node.

If you request more memory per node than is available in the Slurm Account you are submitting to, Slurm will inform you that your job cannot be scheduled.

Why 3.9GB per processor and not 4GB?#

Each node in the cluster requires a small amount of memory for overhead operations for the operating system. By multiplying memory by 3.9GB instead of 4GB, we ensure that compute jobs have access to as much memory as possible with as little wasted as possible.

If compute jobs are given access to all the memory on a node, they can run the node out of memory and cause the node to crash. This happens occasionally.

Time resource planning#

The -t/--time parameter allows you to specify exactly how long Slurm allocates resources for your job.

Each Slurm Account (queue) is configured with a maximum amount of time that jobs are allowed to run. Each Slurm Account in the cluster has a default and maximum time limit. When you submit your job, you can specify exactly how long you expect to run. Jobs that request less time tend to have shorter wait times in-queue than jobs that reques longer times.

# Time format is: [D-HH:MM:SS]

#SBATCH -t 2-10:30:00   # This job will require two days, 10 housr, and 30 minutes to complete
#SBATCH -t 30:00        # This job will require 30 minutes to complete
#SBATCH -t 32:00:00     # This job will require 32 hours to complete

Warning

Slurm will kill your job when your time limit is reached regardless of whether the job has completed or not.

Choosing a Slurm account#

You can see which Slurm accounts to by running the following command on the terminal:

$ rcctool my:slurm_accounts    # my:queues or my:partitions also works.

# To see which nodes are available in your Slurm account, add the --nodes parameter
$ rcctool my:slurm_accounts --nodes

Each Slurm Account specifies processor, memory, and time limits. Owner-based accounts get priority access for research groups that have purchased resources on the cluster. All users can use general access accounts.

After your job ends#

Details about completed jobs (sacct)#

If your job has completed within the last year regardless of whether it succeeded or failed, you can extract useful information about it using the sacct command:

1	`$ sacct [OPTIONS...]`

Option	Explanation
`-b`, `--brief`	Print only Job ID, status, and exit code
`-e`	List all available output fields to use with `--format` option
`-j [job_id][.step]`	Display information about a specific job, or job step
`-u [username]`	Display information about a specific user
`-A [slurm_account_name]`	Display information for a specific Slurm Account (queue)
`--format`	Customize the fields displayed (use `sacct -e` to list available fields)

Example:

# Show a summary of recently submitted jobs
$ sacct --brief
   JobID        State         ExitCode 
------------   ----------    -------- 
123456           COMPLETED      0:0 
123456.batch     COMPLETED      0:0 
123456.0         COMPLETED      0:0 
123457           COMPLETED      0:0 
123457.batch     COMPLETED      0:0 
123457.0         COMPLETED      0:0 
123458           COMPLETED      0:0 
123458.batch     COMPLETED      0:0 
123458.0         COMPLETED      0:0 

# Show more details about a specific job
$ sacct -j 123456

   JobID      JobName   Partition    Account   AllocCPUS   State    ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
123456         xlv      genacc_q   genacc_q        16    COMPLETED      0:0 
123456.batch   batch               genacc_q        16    COMPLETED      0:0 
123456.0       micro               genacc_q        16    COMPLETED      0:0

# Show specific fields through output customization (1)
$ sacct -j 123456 --format=Account,Timelimit,NTasks,User,NNodes,NodeList,JobName 

Use sacct -e to show all possible display fields for the --format option

Tip

You can also use the environment variable SACCT_FORMAT to set the default format for the sacct command. Example:

1	`$ export SACCT_FORMAT="JobID%20,JobName,ExitCode,User,Account,NodeList,Start,End,Elapsed,AllocTRES"`

If you want to set the default globally, just add it to your ~/.bashrc file:

1 2	`$ echo -e "\n#Set default sacct output format\nexport SACCT_FORMAT=\"JobID%20,JobName,User,Account,NodeList\"" >> ~/.bashrc $ source ~/.bashrc`

Tip

Maximum memory used is represented by the MaxRSS format field; i.e.,

1	`sacct -j 123456 --format=JobID%20,JobName,ExitCode,MaxRSS`

For more information about sacct, refer to the official Slurm documentation

Efficiency statistics for completed jobs (seff)#

If your job has completed within the last year regardless of whether it succeeded or failed, you can use the seff command to see how efficient your resource usage was. This will better help you tune your submit script to maximize resources and minimize wait times.

1	`$ seff [job_id]`

Example:

$ seff 123456
Job ID: 123456
Cluster: production
User/Group: abc12a/abc12a
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 8
CPU Utilized: 00:00:01
CPU Efficiency: 1.56% of 00:01:04 core-walltime
Job Wall-clock time: 00:00:08
Memory Utilized: 1.47 MB
Memory Efficiency: 0.04% of 4.00 GB

Job Resource Planning

Things to consider#

Nodes, processors, and cores#

Reserving entire nodes for your job#

Taking control over multi-core processors#

Memory resource planning#

Why 3.9GB per processor and not 4GB?#

Time resource planning#

Choosing a Slurm account#

After your job ends#

Details about completed jobs (sacct)#

Efficiency statistics for completed jobs (seff)#

Further reading#