Biopython for Computer-Aided Drug Design (CADD) I
This tutorial is written for someone who:
Has never used Biopython before
May not know what a FASTA file is
Wants to do CADD (docking, structure modeling, MD) and needs the right protein sequence
What is Biopython?
Biopython is a Python toolkit that provides reliable, biology-aware tools to work with sequences and some biology file formats.
In CADD, most workflows need a protein sequence (FASTA) and a protein structure (PDB/mmCIF) when dealing with structure-based drug design.
If the sequence/protein that is used is wrong (wrong isoform, truncated, wrong chain), then the following errors can occur:
The structure model can be wrong.
The binding pocket can shift.
Docking and MD results become unreliable.
Seq — a biological sequence object (DNA/RNA/protein)
SeqRecord — a sequence + metadata (ID, description)
SeqIO — reading/writing files (FASTA, GenBank, etc.)
You need to think of the following concept like this:
Seq = the sequence itself
SeqRecord = the sequence with a label
SeqIO = the file reader/writer.