Condor Job Management

This page describes how to manage queued or running jobs in Condor, along with other tips and tricks.

Viewing queued jobs

View all queued jobs using the condor_q -global command.

View the status of a single job: condor_q -global user|[jobid]

For more information on condor_q, refer to the official documentation

Note: The -global flag is necessary, since there are three submit nodes, and when you login, you may be on a different node than the one you originally submitted your job from.

Run a whole machine job

Condor supports whole machine jobs. If you wish to allocate all cores on each node for a job, add the following directives to your job submit file:

+RequiresWholeMachine = True
Requirements = (Target.CAN_RUN_WHOLE_MACHINE =?= True)
Rank = memory

The last line (Rank = memory) is optional, but recommended.

For more about whole machine jobs, refer to the official documentation

SSH into a running job

Sometimes you need to access a job in Condor while it is running.

Before you submit your job...

In order to enable the ability to SSH into running jobs, you must add a single directive to your job Submit file:

...
RunAsOwner = True
...

Once your job is running...

  1. Find out what your job ID is by running condor_q [YOUR_USERNAME]:
    condor_q cam02h
    -- Submitter: condor-login.local :  : condor-login.local
    ID      OWNER              SUBMITTED     RUN_TIME   ST PRI SIZE CMD              
    254.0   YOUR_FSUID  9/5    12:29         0+00:01:02 R  0   0.0  a.out   
  1. SSH into the Condor node identified in the ID column:
    condor_ssh_to_job 254.0

    Could not chdir to home directory /panfs/storage.local/scs/home/dcshrum: No such file or directory
    Welcome to slot7@condor-25.local!
    Your condor job is running with pid(s) 23388.
    bash-4.1$
  1. You can now interact with the server that is running your Condor job.

NOTE: It is important to note that you do not have access to your Lustre home directory on the node that is running your job. Only those files that you specified to transfer in your submit file (along with logs and output) are available.

How to Retrieve Logs for a running Condor Job

Sometimes you may wish to retrieve logs for a Condor job while it is running.

Before you submit your job...

In order to enable the ability to retrieve logs for running jobs, you must add a single directive to your job Submit file:

...
RunAsOwner = True
...

Once your job is running...

  1. Instruct Condor to crate a dump of your logs using the condor_gather_info command:
    condor_gather_info --jobid 296.0

    Gathering Condor and machine information...
    acquire_job_q: Getting job q for job 296.0
    acquire_job_ad: Getting job ad for job 296.0
    acquire_job_analysis: Getting job analysis for job 296.0
    acquire_job_userlog_lines: Getting job log entries for job 296.0 ulog file is /lustre/scs/home/dcshrum/condor/vanilla/0.log
    cp: cannot stat `/var/log/condor/MasterLog.old': No such file or directory The execute machine for this job is slot3@condor-17.local Creating output file with all results in cgi-YOUR_FSUID-jid296.0-2013-09-07-12_16_55AM-EDT.tar.gz
  1. Extract the logfiles:
tar zxvf cgi-YOUR_FSUID-jid296.0-2013-09-07-12_16_55AM-EDT.tar.gz

For more information about the condor_gather_info command, refer to the official documentation or the manpage.