This page will describe how to submit jobs to Condor using ClassAds and will provide an example job submission.
What is a ClassAd?
Condor negotiates which resources to run your job on via ClassAds. Each compute node in Condor provides a ClassAd describing what resources are available on that system (CPU architecture, available software, etc). Likewise, each job includes a ClassAd describing what resources it needs to run. The Condor manager matches job ClassAds up with system ClassAds when jobs run in the queue.
Overview of Submitting a Job
- Compile (or us an existing) an executable for the architecture(s) you wish to run jobs on.
- The login nodes provide the GCC compilers, but you can also compile software on your own system before submitting to Condor.
- Create (or use an existing) Condor job Submit file that will control a "cluster" of jobs. Condor creates Job ClassAds from this file when you submit it.
- Periodically check your job status. Some jobs may have been kicked off by returning owners.
- Collect result output file.
Create a Submit File
A Submit File is simply a text file that describes how to run the job. Condor creates a job ClassAd from this file when the job is submitted. Refer to the example below:
# The HTCondor Vanilla Universe is the only Universe available on our installation Universe = vanilla # The executable to run, relative to the ClassAd file location Executable = ~/my_program.sh # Command line arguments to use with the program, separated by spaces Arguments = -v -i input_file.txt # Job Requirements (ex: this job needs 64 bit Linux with more than 512mb of RAM) Requirements = arch == "X86_64" && OpSys == "LINUX" && Memory >= 512 # Rank (ex: this job prefers higher memory machines) Rank = Memory >= 1000 # Output file ($(Process) refers to the JobID) Output = $(Process).out Error = logs/$(Process).err Log = logs/$(Process).log # Email notification (can be NEVER, ALWAYS, COMPLETE, or ERROR) Notification = NEVER # Transfer any files from login node when the job starts? See explanation below should_transfer_files = YES # Which files to transfer from the login node when the job starts transfer_input_files = input/$(Process).in someotherfile.txt athirdfile.csv # Transfer files back to login node when job completes. See explanation below when_to_transfer_output = ON_EXIT # Queue three jobs in this 'cluster' queue 3
This Submit file contains a few bits of information that are worth discussing further:
- Universe - FSU offers only the Vanilla universe on our Condor implementation, so this setting will always be the same.
- Executable - This is the actual program to run, relative to the ClassAd file location.
- Arguments - Any command line flags (arguments) you wish to add to the executable command.
- Requirements - Here, you can specify exactly what characteristics nodes should have that run your job.
- Rank - Optionally specify rules for prioritizing which nodes should run your job. This uses Condor Rank syntax, and there are many options.
- Output, Error, Log - Files to send output to, relative to the ClassAd file location. The
$(Process)variable is useful here to give each job run its own output, error, and log files. If you do not give each file its own name, future job runs will clobber (overwrite) older job run files.
- Notification - Optionally send an email to your RCC or FSU email address upon job completion. Set this to
NEVERif you do not wish to receive emails.
- should_transfer_files - If you need any input files for your job in addition to the executable itself, you should set this to
YES. This will cause Condor to copy those files to the nodes that the job is running on. Since the FSU Condor implementation does not use a shared file system, this is necessary for input files.
- **transfer_input_files8* - List files here that you wish to transfer (only if
should_transfer_filesis set to
- when_to_transfer_output - This specifies when to copy output, error, and log files from the script execution back to your Lustre directory. Set this to
ON_EXITif you want files to be transferred when the job completes. Set it to
ON_EXIT_OR_EVICTif you want to transfer the files even if the job started but did not get a chance to complete.
- queue - The number of jobs to queue based on these parameters. Typically this will be
1, but you can set it to any positive integer.
A much more thorough guide for creating a Job Submit file is available in the Condor Documentation
Submit your Job
Once you have created a Job Submit file, you can submit your job using the
condor_submit command. Example:
Refer to the Condor Documentation or the
condor_submit manpage for information about this command.
To manage jobs once submitted, refer to our Condor Job Management reference.