ELPH is a bioinformatics program designed to perform Gibbs Sampling on DNA and protein sequence data in order to find patterns and motifs in the sequences. The program can handle as many as thousands of sequences at a time.
Using ELPH on RCC Resources
Serially Running ELPH
The gnu module must be loaded before running ELPH. To load the module, use the command:
$ module load gnu
To run ELPH from the command line, use the format
$ elph [FILES] -[OPTIONS]
[FILES] argument with your sequence file(s) and the
-[OPTIONS] argument with the list of options required for your job. A complete list of available options can be found either by simply typing
elph into the command line or in the official user's manual.
As a short example, a very simple run with a test file in FASTA format could be run with the following command:
$ elph TEST.fa LEN=10 -o OUTFILE.txt
Running ELPH in Parallel
ELPH can also be run in parallel, and requires the GNU OpenMPI module. To load the module, use the command:
module load gnu-openmpi
You can then run the program through submitting a Slurm script to HPC. Your script file must use the .sh extension. Below is an example script using the same run as the serial example:
#! /bin/bash #SBATCH -J ELPH_TEST #SBATCH -n 4 #SBATCH -p genacc_q #SBATCH -t 00:10:00 #SBATCH --mail-type=ALL module load gnu openmpi elph TEST.fa LEN=10 -o OUTFILE.txt
Then submit your script using the following command, replacing
YOURSCRIPT with the name of your script file:
$ sbatch YOURSCRIPT.sh
For detailed usage documentation for ELPH, please refer to the official website.