Condor Job Management

This page describes how to manage queued or running jobs in Condor.

Viewing Queued Jobs

The following commands allow you to view your queued jobs:

# View all of your queued jobs (the -global flag is necessary)
$ condor_q -global

# View jobs for a single user:
$ condor_q -global [username]

# View status for a single job
$ condor_q -global [jobid]

Note: The -global flag is necessary, because there are three submission nodes, and when you login, you may be on a different node than the one you originally submitted from.

SSH to a Running Job

Sometimes, you may need to access a job in Condor while it is running. You can login to node(s) where your job is running if you need to by following these steps:

Before you submit your job...

In order to enable the ability to SSH into running jobs, you must first add a single directive to your job submit file:

...
RunAsOwner = True
...

This will ensure that your job runs under your username on all compute nodes.

Once your job is running...

Find out what your Job ID is by running condor_q [username]:

condor_q cam02h

- Submitter: condor-login.local :  : condor-login.local
    ID      OWNER              SUBMITTED     RUN_TIME   ST PRI SIZE CMD              
    254.0   YOUR_FSUID  9/5    12:29         0+00:01:02 R  0   0.0  a.out   

In this example, the job ID is 254.0

Then run, condor_ssh_to_job [jobid]:

condor_ssh_to_job 254.0
  
Welcome to slot7@condor-25.local!
Your condor job is running with pid(s) 23388.
bash-4.1$

You can now interact with your Condor job.

Retrieve Logfiles for a running Condor Job

Sometimes you may need to retrieve logs for a Condor job while it is running. Typically, Condor will write logs directly to your Lustre home directory when the job is running, so you should be able to access the data simply by locating the log file in your Lustre directory.

If you need to retrieve the logs directly from the job, you can use the condor_gather_info command.

Before you submit your job...

In order to enable the ability to SSH into running jobs, you must first add a single directive to your job submit file:

...
RunAsOwner = True
...

This will ensure that your job runs under your username on all compute nodes.

Once your job is running...

Instruct Condor to create a dump of your logs:

$ condor_gather_info --jobid 296.0

Gathering Condor and machine information...
    acquire_job_q: Getting job q for job 296.0
    acquire_job_ad: Getting job ad for job 296.0
    acquire_job_analysis: Getting job analysis for job 296.0
    acquire_job_userlog_lines: Getting job log entries for job 296.0 ulog file is /lustre/scs/home/dcshrum/condor/vanilla/0.log

Extract the logfiles:

$ tar zxvf cgi-YOUR_FSUID-jid296.0-2013-09-07-12_16_55AM-EDT.tar.gz

For more information about the condor_gather_info command, refer to the official documentation or the manpage.