Clustal W is a program designed to take in nucleic acid (genetic) sequence data or protein sequence data and align them. Clustal W is essentially the same program as Clustal X; the only difference is that Clustal X is a GUI for Clustal W.
Using Clustal W on RCC Resources
Running Clustal W on HPC Login Nodes
Clustal W requires the gnu module to run on HPC login nodes.
In order to begin working with Clustal W on a login node, simply run the command
clustalw and a command-line interface with prompts will run. From this interface, you can run your Clustal W job. You can also specify a list of options and files to run Clustal W with in non-interactive mode. This can be done using
clustalw -[OPTIONS] FILES. Detailed documentation on the options and inputs available for use with Clustal W can be found by typing
clustalw -help. For more information, please refer to the official website.
The following is a basic example run of the program. Note that without specifying an output format, you will get a default output file which has the same name as the input file with a different file extension (.aln).
module load gnu clustalw TEST.fa
Running Clustal W in Parallel
It is also possible to run Clustal W in parallel with OpenMPI. In order to do this, the GNU OpenMPI modules are required and must be loaded first.
An example run for Clustal W is below, again using the default output parameters. The first set of code is the Slurm submit script, and the second set is the commands to run the submit script. The submit script must be saved with the .sh extension. We use the name CLUSTALTEST.sh as an example.
#!/bin/bash #SBATCH -J clustalTest # Rename to better describe your specific job #SBATCH -n 4 #SBATCH -t 1:00:00 #SBATCH -p genacc_q module load gnu openmpi srun clustalw TEST.fa