MAFFT

Software Category
Version
7.407

MAFFT

MAFFT is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this, some of which are better suited to smaller sequence reads (such as L-INS-i) and some of which are better suited to larger sequence reads (such as FFT-NS-2).

Using MAFFT on RCC Resources

Serially Running MAFFT on HPC Login Nodes and Spear

MAFFT requires the gnu module to run on HPC login nodes and Spear. To load the module, use the command module load gnu

In order to begin running MAFFT, simply type mafft -[OPTS] INPUT > OUTPUT where -[OPTS] is a list of command line options you wish to run your job with and INPUT > OUTPUT are the required input and output files. For detailed usage information, see the official website. MAFFT also contains a number of other related programs including linsi, ginsi, and mafft-profile. Detailed information on these can be found in the official MAFFT manual.

As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using the commands:

module load mafft
mafft TEST.fa > OUTPUT

Running MAFFT in Parallel on RCC Resouces

If you wish to run MAFFT in parallel on RCC machines, you will need to load the GNU OpenMPI module using the command module load gnu-openmpi. This will give you access to the mpirun command. You can then run MAFFT by writing a Slurm script, which must be saved as a file with the .sh suffix. Below is an example script using TEST.fa as the FASTA data file and outputting to the file OUTPUT.

#! /bin/bash

#SBATCH -J MAFFT_Test
#SBATCH -p genacc_q
#SBATCH -n 4
#SBATCH -t 00:10:00
#SBATCH --mail-type=ALL

module load gnu openmpi
srun mafft TEST.fa > OUTPUT

For additional usage information and for example files to test MAFFT with, please refer to the official website.