ABySS is a Bioinformatics program designed to assemble genomes from small paired-end sequence reads. It can be run either in serial or in parallel, though the parallel version is capable of efficiently assembling larger genomes than the serial one is.
Running ABySS on RCC Resources
ABySS can be run in HPC serially as well as in parallel. The gnu and openmpi modules need to be loaded before running ABySS.
Running ABySS in Serial
Download and assemble a small synthetic data set.
module load gnu openmpi abyss-pe k=25 name=test se=https://raw.github.com/dzerbino/velvet/master/data/test_reads.fa
Calculate assembly contiguity statistics
To assemble paired reads in two files named test-1.fa and test-3.fa into contigs in a file named test-contigs.fa, run the command:
abyss-pe name=test k=64 in='test-1.fa test-3.fa'
Further details about the commands can be found in the ABySS documentation.
Parallaly Running ABySS on HPC
Following SLURM submit script can be used as a template to submit a parallel ABySS job in HPC.
#!/bin/bash # # Name your job #SBATCH -J abyss # #Change the queue #SBATCH -p genacc_q # #Change the number of nodes and processes per node as necessary #SBATCH -N 2 #SBATCH --ntasks-per-node=4 # #Change the wall time #SBATCH -t 00:30:00 # module load gnu openmpi # #Run your ABySS commands srun abyss-pe name=test k=48 n=8 in='test-1.fa test-3.fa'