Biopython
Biopython
Biopython is a set of tools written in Python to analyze and process bioinformatics datasets including genomes, genetic sequence reads, and proteins. The program contains a myriad of analysis tools and algorithms to use on these types of biological data.
Using Biopython on RCC Resources
In order to use Biopython on HPC, you will need to install Biopython in either a conda virtual environment or a Python venv virtual environment. The basic process to create a Python venv virtual environment is as follows:
$ python3 -m venv ~/MYENV # You can rename MYENV to any name you want
$ source ~/MYENV/bin/activate # Activates the virtual environment
$ pip install biopython # Installs the biopython package
$ python # load python
After you load python, you will see the ">>>" prompt, after which you can type your commands. To exit python, simply type the command exit(), and to exit the virtual environment, simply type the command
.deactivate
While using python in the virtual environment, import the Biopython package using the command
. A tutorial for using Biopython is available here on the official website. Below are some example commands and their results, taken from section 2.2 of the tutorial:import Bio
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT', Alphabet())
>>> print(my_seq)
AGTACACTGGT
>>> my_seq.alphabet
Alphabet()
For detailed usage documentation for Biopython, refer to the official website.