Clustal W is a program designed to take in nucleic acid (genetic) sequence data or protein sequence data and align them. Clustal W is essentially the same program as Clustal X. The only difference is that Clustal X is a GUI for Clustal W.
Using Clustal W on RCC Resources
Running Clustal W on HPC Login Nodes
Clustal W requires the gnu module to run on HPC login nodes. The interface is very interactive. In order to begin working with Clustal W on a login node, simply run the command
clustalw and a command-line interface with prompts will run. From this interface, you can run your Clustal W job. You can also specify a list of options and files to run Clustal W with in non-interactive mode. This can be done using
clustalw -[OPTIONS] FILES. Detailed documentation on the options and inputs available for use with Clustal W can be found by typing
clustalw -help. For more information, please refer to the main website. An example run for this program would be as follows. Note that without specifying an output format, you will get a default output file which has the same name as the input file with a different file extension (.aln).
module load gnu clustalw TEST.fa
Running Clustal W in Parallel
It is also possible to run Clustal W in parallel with OpenMPI. In order to do this, the GNU OpenMPI modules are required. This must be loaded first. An example of a run for Clustal W would be as follows, again using the default output parameters. The first set of code is the SLURM submit script, the second is the commands to run the submit script. We need to save the submit script as some name with a .sh suffix. We use the name CLUSTALTEST.sh as an example.
#!/bin/bash #SBATCH -J clustalTest # Rename to better describe your specific job #SBATCH -n 4 #SBATCH -t 1:00:00 #SBATCH -p genacc_q module load gnu openmpi srun clustalw TEST.fa