Your First Biopython Objects: Seq
Let's make use of it.
Create a protein sequence (Figure 1).
What you’re doing: You’re storing a protein sequence in a Biopython Seq object.
Why not use a normal Python string?
Strings don’t “know biology.”
Biopython objects come with biology-aware behaviors and safer handling.
Copy/paste into a code cell
Code:
from Bio.Seq import Seq
protein_seq = Seq("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAN")
print(protein_seq)
Figure 1: Creation of protein sequence
Basic sanity check: check the length of your protein (Figure 2).
Copy/paste into a code cell
Code:
print("Protein length:", len(protein_seq))
Figure 2: Check the length of your protein
Proteins are written using one-letter amino acid codes:
A = Alanine, L = Leucine, K = Lysine, D = Aspartic acid, etc.
In CADD, amino acids matter because:
Binding pockets often contain hydrophobic residues.
Selectivity often depends on charged/polar residues.
Mutations change amino acids and can cause drug resistance.