Clustal W is a program designed to take in nucleic acid (genetic) sequence data or protein sequence data and align them. Clustal W is essentially the same program as Clustal X; the only difference is that Clustal X is a GUI for Clustal W.
Using Clustal W on RCC Resources
Running Clustal W on the HPC
The gnu module must be loaded before running Clustal W. Then to begin working with Clustal W, simply run the command
clustalw and a command-line interface with prompts will run. From this interface, you can run your Clustal W job. You can also specify a list of options and files to run Clustal W with in non-interactive mode. This can be done using
clustalw -[OPTIONS] FILES. Detailed documentation on the options and inputs available for use with Clustal W can be found by typing
clustalw -help. For more information, please refer to the official website.
The following is a basic example run of the program. Replace
TEST with the name of your sequence file. Note that without specifying an output format, you will get a default output file which has the same name as the input file with a different file extension (.aln).
$ module load gnu $ clustalw TEST.fa
Running Clustal W in Parallel
It is also possible to run Clustal W in parallel with OpenMPI. In order to do this, the GNU OpenMPI module is required and must be loaded first.
Below is an example Slurm script to run Clustal W, again using the default output parameters. The script must be saved with the .sh extension.
#!/bin/bash #SBATCH -J clustalTest # Rename to better describe your specific job #SBATCH -n 4 #SBATCH -t 1:00:00 #SBATCH -p genacc_q module load gnu openmpi srun clustalw TEST.fa
Then submit your script using the following command, replacing
YOURSCRIPT with the name of your script file:
$ sbatch YOURSCRIPT.sh