HPC Job Reference

This page provides a general reference for submitting and managing jobs on the HPC using the Slurm scheduler. For an introductory guide to submitting jobs, refer to /doc/submitting-hpc-jobs-slurm.

Commands

Viewing detailed documentation for a command:

man <command>

General commands:

sbatch   # For submitting jobs
squeue   # For viewing job and partition info
scancel  # For cancelling jobs
rcctool  # For viewing information about your RCC account

Viewing Your Available Partitions

Use the rcctool command to view which partitions you have access to:

rcctool my:partitions

Example output:

---------------------------------------------
| Partition Name | Max Walltime | Max Procs |
---------------------------------------------
| genacc_q       | 00:45:00     | 2,344     |
| backfill       | 01:00:00     | 255       |
| xsede_q        | 10:50:00     | 4,000     |
---------------------------------------------

Submitting Jobs

Use the sbatch command to submit jobs:

sbatch my_job_script.sbatch

Job Submission Parameters

Option Explanation
-J jobname Optional job name
--mail-type=type Mail type can be: BEGIN, END, FAIL, REQUEUE, or ALL
--mail-user=your_email_address An email address to send job notifications (defaults to the email registered with RCC)
-n # Total number of tasks to run (note: lowercase n)
-p partition_name Parititon (queue) itno which to submit job, e.g., backfill, generic_q
-t HH:MM:SS Wall clock limit, other formats: minutes, days-hours, days-hours:minutes, days-hours:minutes:seconds
-D directory Working directory for batch script (defaults to current working directory)
-o out_file     file for batch script's standard output (default name is slum-$jobid.out)
-e error_file file for batch script's standard error stream (default combines error output with standard output)
-i in_file file for batch script's standard input (if any)
--ntasks-per-node=##   number of tasks to invoke on each node
-N ## number of nodes on which to run (note: uppercase N)
-c ##     number of cpus required per task
-d type:jobid     dependency. defer job until condition [type] on job with id is satisfied (see below)
-C feature(s) run only on nodes with the specified features (eg. intel, amd, YEAR2014, etc; see below)
--exclusive     run job exclusively, i.e., do not share nodes with other jobs
--mem-per-cpu=##     specify the memory per cpu in megabytes (default is 3.9gb per node)

Example Types of Jobs

  1. MPI Jobs
  2. Interactive Jobs
  3. Array Jobs
  4. HADOOP Jobs
  5. gpGPU Jobs
  6. Intel Xeon Phi Jobs

Job Status using squeue

Use the squeue command to view job status:

squeue -j <JOB_ID>                       # View status of job
squeue -j <JOB_ID>,<ANOTHER_ID>,etc      # View status of multiple jobs
squeue -u <USER_NAME>                    # View status of jobs for user
squeue -p <PARTITION_NAME>,<ANOTHER>,etc # View status of jobs in partition(s)
squeue -t <STATE_NAME>,<ANOTHER>,etc     # View jobs of a given state

Examples:

# View job statuses for jobs with IDs 400 and 401
squeue -j 400,401

# View all pending or suspend jobs for user 'john' in genacc_q
squeue -u john -t PENDING,SUSPENDED -p genacc_q

# View all running jobs for user 'john'
squeue -u john -t RUNNING

Jobs can be in any of the following states:

Job State Description
PENDING Job has been queued in a partition and is awaiting resources to start running.
RUNNING The job is currently running
SUSPENDED The job is temporarily suspended
COMPLETED The job has completed and is cleaning up
CANCELLED The job was cancelled by the user, an administrator, or the system
FAILED The job failed due to a system error
TIMEOUT The job reached its maximum configured allowed runtime and terminated

Estimating Job Starting Time

Our system can attempt to estimate the time your job will start, but this is at-best a rough guess, based on the currently running jobs and how you configured your job execution time upon submission:

squeue   --start   (for all your jobs)
squeue   --start -j <JOB_ID>

Cancelling Jobs

Use the scancel command to cancel a job:

scancel <job_id>

Limiting types of nodes your jobs run on using Constraints and Node Lists

Our cluster consists of nodes with various models of AMD and Intel processors.  If it is important that your job run only on a single type of processor, you can limit your job either by feature or by node list.  To retrieve a list of available features, login to the HPC and run:

$ sinfo -o %f
YEAR2012,intel
YEAR2013,intel
YEAR2010,amd
YEAR2014,intel
YEAR2015,intel
YEAR2008,amd

To limit by feature, use the -C paramter in your job parameters:

#SBATCH -C "YEAR2012,intel"

You can use AND ("&") and OR ("|") opreators in your constraint parameter; for exmample:

# Either Year 2012 or Year 2013
#SBATCH -C "YEAR2012,intel|YEAR2013,intel"

If you need to ensure that your job runs only on specific nodes, you can use the --nodelist.  This option should be used only if absolutely necessary.  We periodically add, remove, re-allocate and rename nodes, so node names should not be considered reliable.  If you need to specify a cosntraint not already defined in the system, please contact us and request it be added as a feature. 

Job statistics for running jobs using sstat

The sstat command will provide CPU usage, current nodes, and other information for jobs that are currently running:

sstat [OPTIONS...]
Option Explanation
-e Print all possible output fields
-j Job ID(s) (required) (comma-separated)
-o, --fields specify output fields (comma-separated) jobid,ntasks,avecpu,mincpu,mincputask,mincpunode
-n Omit headings
-p, --parsable Deliminate fields with pipe symbol ("|")
-a, --allsteps Print all steps for the given job(s)

Example:

$ sstat  -j 624 -p -o  jobid,ntasks,avecpu,mincpu,mincputask,mincpunode
JobID | NTasks|AveCPU|MinCPU|MinCPUTask|MinCPUNode|
624.0 |16  |12:36.000 | 12:35.000 | 1 | hpc-tc-3 |

Job statistics for finished/failed jobs using sacct

If your job has already finished or failed, you can extract useful information about it using the sacct command:

$ sacct [OPTIONS...]
Option Explanation
-b, --brief Show brief output including job ID, status, and exit code
-e print all possible output fields
-j (job-id[.step]) Display information about specific job, or specific job step
-u (username) Display information only for the specified user
-p (partition_name) Display infomration only for the specified partition

Example:

$ sacct --brief
   JobID        State         ExitCode 
------------   ----------    -------- 
623           COMPLETED      0:0 
623.batch     COMPLETED      0:0 
623.0         COMPLETED      0:0 
624           COMPLETED      0:0 
624.batch     COMPLETED      0:0 
624.0         COMPLETED      0:0 
625           COMPLETED      0:0 
625.batch     COMPLETED      0:0 
625.0         COMPLETED      0:0 
$sacct -j 625
   JobID      JobName   Partition    Account   AllocCPUS   State    ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
625            xlv      genacc_q   genacc_q        16    COMPLETED      0:0 
625.batch      batch               genacc_q        16    COMPLETED      0:0 
625.0          micro               genacc_q        16    COMPLETED      0:0 

Slurm Environment Variables (commonly used)

Variable Description
SLURM_JOBID Job ID
SLURM_SUBMIT_DIR Job submission directory
SLURM_SUBMIT_HOST Name of host from which job was submitted
SLURM_JOB_NODELIST Name of nodes allocated to job
SLURM_ARRAY_TASK_ID Task ID within job array (if applicable)
SLURM_JOB_CPUS_PER_NODE CPU cores per node allocated to job
SLURM_NNODES Number nodes allocated to job

A comprehensive list of environment variables is available at the official Slurm documentation.

Interactive Jobs

The HPC is not intended to run interactive jobs.  However, sometimes this is necessary.  For this use-case, you use the srun command instaed of sbatch.  Below is a simple example:

# Ask for 8 cores and 30 minutes for an interactive job from the login node
[bchen3@hpc-tc-login lensing]$ srun --pty -t30:00 -n8 /bin/bash 

# You are now on the compute node, and in the same directory as your were on the login node.
# srun will use 8 cores automatically for the executable "micro"
[bchen3@hpc-tc-1 lensing]$ srun ./micro &
[1] 11273
[bchen3@hpc-tc-1 lensing]$     0.33      -1.98      -2.68 
  0.39       3.97      -2.57 
  0.70       3.04       1.61 
  0.12      -4.46      -1.29 

# ...work interactively...

# exit when you are done
$ exit

Job Dependencies

Slurm supports job dependencies.  You can submit jobs that will be deferred until other jobs have either completed or terminated in failed state.  This allows you to break your workflow/task down into smaller atomic steps.  Use the -d option to specify a dependency:

#SBATCH -d <dependency_list>

The dependency_list parameter takes the form of <dependency_type:job_id>.  dependency_type can be any of the following:

  • after:job_id - Job will start after the specified job(s) have begun execution
  • afterany:job_id - Job will start after the specified job(s) has finished (any in this context means any exit state; failed or successful)
  • afternotok:job_id - Job will start after the specified job(s) have terminated in some failed state
  • afterok:job_id - Job will start after the specified job(s) have executed and terminated successfully

Examples:

# This job will start only after JOB #23 has started
#SBATCH -d after:23

# Multiple job dependencies; this job will start only after Jobs #23 and #25 have started
#SBATCH -d after:23:25

# Multiple job dependency types; this job will start only after jobs #23 and #25 have started
# and job #30 has ended successfully
#SBATCH -d after:23:25,afterok:30

# A simpler way to ensure jobs only run one at a time is to use the 'singleton' option.
# This ensures that this job will not start until all previously submitted jobs with the same
# nave have terminated (afterany)
#SBATCH -n "My Job"
#SBATCH -d singleton

Job Arrays

Slurm allows you to submit a number of near identical jobs simulteanously in the form of a job array.  Your workload may be a good candidate for this if you have such a scenario, and your jobs only differ by an index of some sort.  For example, if your jobs use the same parameters and code, but with each with a different input and output file.

#SBATCH -a [##-##](%##)

##-## refers to your index, and the optional (%##) refers to the number of tasks that are allowed to run concurrently, if you wish to limit that for any reason.

Below is an example of running 64 jobs, indexed 0-63 on one node, allowing four tasks to run concurrently:

$ sbatch -a 0-63%4 -N1 ./test.sh
$ cat test.sh
  echo array_id = $SLURM_ARRAY_JOB_ID job_id = $SLURM_JOBID task_id = $SLURM_ARRAY_TASK_ID 
  sleep 30
  $ squeue
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  167_[4-63%4]  genacc_q  test.sh   bchen3 PD       0:00      1 (JobArrayTaskLimit)
         167_0  genacc_q  test.sh   bchen3  R       0:03      1 hpc-tc-1
         167_1  genacc_q  test.sh   bchen3  R       0:03      1 hpc-tc-1
         167_2  genacc_q  test.sh   bchen3  R       0:03      1 hpc-tc-1
         167_3  genacc_q  test.sh   bchen3  R       0:03      1 hpc-tc-1

Note: Tasks in the job arrays use the JOB ID format: ###_##, where ### refers to the job ID, and ## refers to the index in the array.  Each task in the array will generate its own output file, unless you specify a custom output file in your submission script.  The default name is slurm-array[job-id]_[task-id].out.

Tips

If you use the -N option to specify the number of nodes, you will only get one core per node by default:

#SBATCH -N 4

To use more than one core on each nmode, specify the --ntasks-per-node option, e.g.:

#SBATCH -N 4
#SBATCH --ntasks-per-node=8

If you wish to gain exclusive access to nodes (i.e. use the entire node for your job only; no other running jobs), use the --exclusive option:

#SBATCH -N 2
#SBATCH --exclusive

Note that this will likely cause your job to wait in queue for a much longer period of time.

If your job hasn't started yet, it is usually either because it hasn't been given priority to start by the scheduler, or because it is waiting on resources.  You can check the reason by running squeue:

$ squeue -j <job_id>

The last colum in the outpu twill show the reason that your job has not started.