This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to /docs/submitting-hpc-jobs-slurm.
Command Overview
The following are the most common commands used for job management:
# Submit a job
$ sbatch <PATH_TO_SCRIPT>
# View job info for queued or running job
$ squeue -j <JOB_ID>
# View info about all of your currently queued and running jobs
$ squeue -u `whoami`
# Cancel a job
$ scancel <JOB_ID>
# View information about your RCC account
$ rcctool my:account
# View which partitions your account has access to
$ rcctool my:partitions
View detailed information about Slurm commands
# sbatch reference
$ man sbatch
# squeue reference
$ man squeue
# sinfo reference
$ man sinfo
Submitting Jobs
Use the sbatch command to submit jobs:
sbatch my_job_script.sbatch
Job Submission Parameters
Option | Explanation |
---|---|
-J jobname | Optional job name |
--mail-type=type | Mail type can be: BEGIN, END, FAIL, REQUEUE, or ALL |
--mail-user=your_email_address | An email address to send job notifications (defaults to the email registered with RCC) |
-n # | Total number of tasks to run (note: lowercase n ) |
-p partition_name | Parititon (queue) into which to submit job, e.g., backfill, generic_q |
-t HH:MM:SS | Wall clock limit, other formats: minutes, days-hours, days-hours:minutes, days-hours:minutes:seconds |
-D directory | Working directory for batch script (defaults to current working directory) |
-o out_file | file for standard output (default name is slurm-$jobid.out) |
-e error_file | file for batch script's standard error stream (default combines error output with standard output) |
-i in_file | file for batch script's standard input (if any) |
--ntasks-per-node=## | number of tasks to invoke on each node |
-N ## | number of nodes on which to run (note: uppercase N) |
-c ## | number of cpus required per task |
-d type:jobid | dependency. defer job until condition [type] on job with id is satisfied (see below) |
-C feature(s) | run only on nodes with the specified features (eg. intel, amd, YEAR2014, etc; see below) |
--exclusive | run job exclusively, i.e., do not share nodes with other jobs |
--mem-per-cpu=## | specify the memory per cpu in megabytes (default is 3.9GB / maximum is typically 4GB) |
Example Types of Jobs
- MPI Jobs
- Interactive Jobs
- Array Jobs
- HADOOP Jobs
- gpGPU Jobs
- Intel Xeon Phi Jobs
View job status using squeue
Use the squeue command to view job status:
squeue -j <JOB_ID> # View status of job
squeue -j <JOB_ID>,<ANOTHER_ID>,etc # View status of multiple jobs
squeue -u <USER_NAME> # View status of jobs for user
squeue -p <PARTITION_NAME>,<ANOTHER>,etc # View status of jobs in partition(s)
squeue -t <STATE_NAME>,<ANOTHER>,etc # View jobs of a given state
Examples:
# View job statuses for jobs with IDs 400 and 401
squeue -j 400,401
# View all pending or suspend jobs for user 'john' in genacc_q
squeue -u john -t PENDING,SUSPENDED -p genacc_q
# View all running jobs for user 'john'
squeue -u john -t RUNNING
Jobs can be in any of the following states:
Job State | Description |
---|---|
PENDING | Job has been queued in a partition and is awaiting resources to start running. |
RUNNING | The job is currently running |
SUSPENDED | The job is temporarily suspended |
COMPLETED | The job has completed and is cleaning up |
CANCELLED | The job was cancelled by the user, an administrator, or the system |
FAILED | The job failed due to a system error |
TIMEOUT | The job reached its maximum configured allowed runtime and terminated |
Estimating Job Starting Time
The scheduler can attempt to estimate the time your job will start, but this is at-best a rough guess, based on the resources you requested in your submission script and what resources are currently in-use:
squeue --start (for all your jobs)
squeue --start -j <JOB_ID>
Cancelling Jobs
Use the scancel command to cancel a job:
scancel <job_id>
Specify constraints to limit what nodes to run on
Our cluster consists of nodes with various models of AMD and Intel processors. If it is important that your job run only on a single type of processor, you can limit your job either by feature or by node list. To retrieve a list of available features, login to the HPC and run sinfo -o %f:
$ sinfo -o %f
YEAR2012,intel
YEAR2013,intel
YEAR2010,amd
YEAR2014,intel
YEAR2015,intel
YEAR2008,amd
To limit by feature, use the -C paramter in your job parameters:
#SBATCH -C "YEAR2012,intel"
You can use AND ("&") and OR ("|") operators in your constraint parameter; e.g.:
# Either Year 2012 or Year 2013
#SBATCH -C "YEAR2012,intel|YEAR2013,intel"
If you need to ensure that your job runs only on specific nodes, you can use the --nodelist. This option should be used only if absolutely necessary. We periodically add, remove, re-allocate and rename nodes, so node names should not be considered reliable. If you need to specify a constraint not already defined in the system, please contact us and request it be added as a feature.
#SBATCH --node-list=hpc-i36-1,hpc-i36-2,hpc-i36-3
View job statistics for running jobs with sstat
The sstat command will provide CPU usage, current nodes, and other information for jobs that are currently running:
sstat [OPTIONS...]
Option | Explanation |
---|---|
-e | Print all possible output fields |
-j | Job ID(s) (required) (comma-separated) |
-o, --fields | specify output fields (comma-separated) jobid,ntasks,avecpu,mincpu,mincputask,mincpunode |
-n | Omit headings |
-p, --parsable | Deliminate fields with pipe symbol ("|") |
-a, --allsteps | Print all steps for the given job(s) |
Example:
$ sstat -j 624 -p -o jobid,ntasks,avecpu,mincpu,mincputask,mincpunode
JobID | NTasks|AveCPU|MinCPU|MinCPUTask|MinCPUNode|
624.0 |16 |12:36.000 | 12:35.000 | 1 | hpc-tc-3 |
View statistics for recently finished jobs with sinfo
If your job has recently finished or failed, you can extract useful information about it using the sacct command:
$ sacct [OPTIONS...]
Option | Explanation |
---|---|
-b, --brief | Show brief output including job ID, status, and exit code |
-e | print all possible output fields |
-j (job-id[.step]) | Display information about specific job, or specific job step |
-u (username) | Display information only for the specified user |
-p (partition_name) | Display infomration only for the specified partition |
Example:
$ sacct --brief
JobID State ExitCode
------------ ---------- --------
623 COMPLETED 0:0
623.batch COMPLETED 0:0
623.0 COMPLETED 0:0
624 COMPLETED 0:0
624.batch COMPLETED 0:0
624.0 COMPLETED 0:0
625 COMPLETED 0:0
625.batch COMPLETED 0:0
625.0 COMPLETED 0:0
$sacct -j 625
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
625 xlv genacc_q genacc_q 16 COMPLETED 0:0
625.batch batch genacc_q 16 COMPLETED 0:0
625.0 micro genacc_q 16 COMPLETED 0:0
Slurm Environment Variables (commonly used)
The following environment variables are available to the runtime environment for your program:
Variable | Description |
---|---|
SLURM_JOBID | Job ID |
SLURM_SUBMIT_DIR | Job submission directory |
SLURM_SUBMIT_HOST | Name of host from which job was submitted |
SLURM_JOB_NODELIST | Name of nodes allocated to job |
SLURM_ARRAY_TASK_ID | Task ID within job array (if applicable) |
SLURM_JOB_CPUS_PER_NODE | CPU cores per node allocated to job |
SLURM_NNODES | Number nodes allocated to job |
A comprehensive list of environment variables is available at the official Slurm documentation.
Interactive Jobs
The HPC is not intended to run interactive jobs. However, this can be useful for debugging or testing. To run an interactive job, use the srun command instead of the sbatch command. For example:
# Ask for 8 cores and 30 minutes for an interactive job from the login node
[bchen3@hpc-tc-login lensing]$ srun --pty -t30:00 -n8 /bin/bash
# You may need to wait a while for resources to become available....
# You are now on the compute node, and in the same directory as your were on the login node.
# srun will use 8 cores automatically for the executable "micro"
[bchen3@hpc-tc-1 lensing]$ srun ./micro &
[1] 11273
[bchen3@hpc-tc-1 lensing]$ 0.33 -1.98 -2.68
0.39 3.97 -2.57
0.70 3.04 1.61
0.12 -4.46 -1.29
# ...work interactively...
# exit when you are done
$ exit
Job Dependencies
Slurm supports job dependencies. You can submit jobs that will be deferred until other jobs have either completed or terminated in failed state. This allows you to break your workflow/task down into smaller atomic steps. Use the -d option to specify a dependency:
#SBATCH -d <dependency_list>
The dependency_list parameter takes the form of <dependency_type:job_id>. dependency_type can be any of the following:
- after:job_id - Job will start after the specified job(s) have begun execution
- afterany:job_id - Job will start after the specified job(s) has finished (any in this context means any exit state; failed or successful)
- afternotok:job_id - Job will start after the specified job(s) have terminated in some failed state
- afterok:job_id - Job will start after the specified job(s) have executed and terminated successfully
Examples:
# This job will start only after JOB #23 has started
#SBATCH -d after:23
# Multiple job dependencies; this job will start only after Jobs #23 and #25 have started
#SBATCH -d after:23:25
# Multiple job dependency types; this job will start only after jobs #23 and #25 have started
# and job #30 has ended successfully
#SBATCH -d after:23:25,afterok:30
# A simpler way to ensure jobs only run one at a time is to use the 'singleton' option.
# This ensures that this job will not start until all previously submitted jobs with the same
# nave have terminated (afterany)
#SBATCH -n "My Job"
#SBATCH -d singleton
Job Arrays
Slurm allows you to submit a number of near identical jobs simulteanously in the form of a job array. Your workload may be a good candidate for this if you have a number or identical jobs with different input that differs only by some sort of index.
For example, if your jobs use the same parameters and code, but with each with a different input and output file.
#SBATCH -a [##-##](%##)
##-## refers to your index, and the optional (%##) refers to the number of tasks that are allowed to run concurrently, if you wish to limit that for any reason.
Below is an example of running 64 jobs, indexed 0-63 on one node, allowing four tasks to run concurrently:
$ sbatch -a 0-63%4 -N1 ./test.sh
$ cat test.sh
echo array_id = $SLURM_ARRAY_JOB_ID job_id = $SLURM_JOBID task_id = $SLURM_ARRAY_TASK_ID
sleep 30
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
167_[4-63%4] genacc_q test.sh bchen3 PD 0:00 1 (JobArrayTaskLimit)
167_0 genacc_q test.sh bchen3 R 0:03 1 hpc-tc-1
167_1 genacc_q test.sh bchen3 R 0:03 1 hpc-tc-1
167_2 genacc_q test.sh bchen3 R 0:03 1 hpc-tc-1
167_3 genacc_q test.sh bchen3 R 0:03 1 hpc-tc-1
Note: Tasks in the job arrays use the JOB ID format: ###_##, where ### refers to the job ID, and ## refers to the index in the array. Each task in the array will generate its own output file, unless you specify a custom output file in your submission script. The default name is slurm-array[job-id]_[task-id].out.
Tips
If you use the -N option to specify the number of nodes, you will only get one core per node:
#SBATCH -N 4
To use more than one core on each nmode, add the --ntasks-per-node option, e.g.:
#SBATCH -N 4
#SBATCH --ntasks-per-node=8
If you wish to gain exclusive access to nodes (i.e. use the entire node for your job only; no other running jobs), use the --exclusive option (this will likely cause your job to wait in-queue for a much longer period of time):
#SBATCH -N 2
#SBATCH --exclusive
If your job hasn't started yet, it is usually either because it hasn't been given priority to start by the scheduler, or because it is waiting on resources. You can check the reason by running squeue:
$ squeue -j <job_id>
The last colum in the output twill show the reason that your job has not started.