Clustal W

Software Category
Version
2.1

Clustal W

Clustal W is a program designed to take in nucleic acid (genetic) sequence data or protein sequence data and align them. Clustal W is essentially the same program as Clustal X; the only difference is that Clustal X is a GUI for Clustal W.

Using Clustal W on RCC Resources

Running Clustal W on the HPC

The gnu module must be loaded before running Clustal W. Then to begin working with Clustal W, simply run the command clustalw and a command-line interface with prompts will run. From this interface, you can run your Clustal W job. You can also specify a list of options and files to run Clustal W with in non-interactive mode. This can be done using clustalw -[OPTIONS] FILES. Detailed documentation on the options and inputs available for use with Clustal W can be found by typing clustalw -help. For more information, please refer to the official website.

The following is a basic example run of the program. Replace TEST with the name of your sequence file. Note that without specifying an output format, you will get a default output file which has the same name as the input file with a different file extension (.aln).

$ module load gnu
$ clustalw TEST.fa

Running Clustal W in Parallel

It is also possible to run Clustal W in parallel with OpenMPI. In order to do this, the GNU OpenMPI module is required and must be loaded first.

Below is an example Slurm script to run Clustal W, again using the default output parameters. The script must be saved with the .sh extension.

#!/bin/bash
#SBATCH -J clustalTest # Rename to better describe your specific job
#SBATCH -n 4
#SBATCH -t 1:00:00
#SBATCH -p genacc_q

module load gnu openmpi

srun clustalw TEST.fa

Then submit your script using the following command, replacing YOURSCRIPT with the name of your script file:

$ sbatch YOURSCRIPT.sh