Biopython

Software Category
Version
1.79

Biopython

Biopython is a set of tools written in Python to analyze and process bioinformatics datasets including genomes, genetic sequence reads, and proteins. The program contains a myriad of analysis tools and algorithms to use on these types of biological data.

Using Biopython on RCC Resources

In order to use Biopython on HPC, you will need to install Biopython in either a conda virtual environment or a Python venv virtual environment. The basic process to create a Python venv virtual environment is as follows:

$ python3 -m venv ~/MYENV       # You can rename MYENV to any name you want
$ source ~/MYENV/bin/activate   # Activates the virtual environment
$ pip install biopython         # Installs the biopython package
$ python                        # load python

After you load python, you will see the ">>>" prompt, after which you can type your commands. To exit python, simply type the command exit(), and to exit the virtual environment, simply type the command deactivate.

While using python in the virtual environment, import the Biopython package using the command import Bio. A tutorial for using Biopython is available here on the official website. Below are some example commands and their results, taken from section 2.2 of the tutorial:

>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT', Alphabet())
>>> print(my_seq)
AGTACACTGGT
>>> my_seq.alphabet
Alphabet()

For detailed usage documentation for Biopython, refer to the official website