FASTA
FASTA
FASTA is a set of bioinformatics programs designed to take in biological sequence data consisting of either DNA or protein sequences and then search through them to find regions of similarity. The programs can find both locally similar regions or globally similar regions. RCC also has a parallel version available which uses MPI.
Using FASTA on RCC Resources
There are a number of programs included in the FASTA software package. The gnu module needs to be loaded to run these programs unless you want to run in parallel. To run in parallel, one of the available MPI implementations must be loaded such as GNU OpenMPI.
Refer to the official documentation for a complete list of the programs included in the FASTA package. These programs include fasta36
which does sequence comparison. The following is an example run of fasta36
on HPC.
$ module load gnu
$ fasta36 -OPTIONS QUERY.fa LIBRARY.fa
In order to run the program in parallel on RCC systems, you can either do a call to mpirun
or submit it as a Slurm job script. A sample job script for the fasta36
program is below:
#!/bin/bash
#SBATCH --job-name=FASTA_Test
#SBATCH --mail-type=ALL
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 00-04:00:00
#SBATCH --mem-per-cpu=3900M
module load gnu openmpi
srun fasta36 -OPTIONS QUERY.fa LIBRARY.fa
Then submit your script using the following command, replacing YOURSCRIPT
with the name of your script file:
$ sbatch YOURSCRIPT.sh
Note that the above examples can be applied to any of the other FASTA programs. For detailed usage information specific to each of the programs and a more detailed idea of what each program is designed to do, refer to the official website.