2. Topics to be covered
• Introduction.
• History of Genome Sequencing.
• How genomes are sequenced.
• Packaging
• Transfection
• Recovery of clones
• Strategies of genome sequencing
• Application of genome sequencing.
3. Period of time between first man-powered flight and landing on the moon (1902-1969):
67 years
Period of time between discovery of structure of DNA and determination of the sequence of the
entire human genome (1953-2010?)
57 years (?)
4. What is a Genome?
• Gene + Chromosome -> Genome
A/T/G/C
A/U/G/C
5. Why determine the order of
nucleotides?
• Determining the order of billions of
chemical units that builds the genetic
material.
– Secrets of life is locked up in the order of the 4
letters!!!!
5-100 million
living species???
6. Genome Sequencing History
Organism Year Institute Genome Size
Bacteriophage 1976 Walter Fiers at 3569 bp
MS2 the University
of Ghent
Phage Φ-X174 1977 Fred Sanger 5386 bp
Cambridge
Haemophilus 1995 TIGR 1,830,138 bp
influenzae
Saccharomyces 1996 European Effort 12,495,682
cerevisiae (16
chromosomes)
Human Genome 2000 Multiple 3.3 x 109
Project Organizations (3 billion letters)
10. Restriction Enzymes
4 cutters
6 cutters
8 cutters
¼ * ¼ * ¼ * ¼ = 1/256; 1/4096;
1/65536
Small Problem: Human genome size: 3 billion base pairs
How many fragments can be generated using a 4 cutter, 6 cutter and 8 cutter?
16 million for 4 cutters
1*10^6 = 1 million for 6 cutters
1/16 million for 8 cutters
11. Blue
Glucuronides Genomic Libraries
B-Glucuronidase
Antibiotics
Resistant Genes One in
thousand
plasmid
Enzymes will get
foreign
DNA
DNA to be
cloned
Electroporate
12. The exact probability of having any given DNA sequence in the library can be calculated
from the equation
N = ln(1 -P)/ln(1 - f)
P is the desired probability
f is the fractional proportion of the genome in a single recombinant
[Ex. For 4 cutter for human genome would be 256 * 3 X 10^9]
N is the necessary number of recombinants
For example, how large a library (i.e. how many clones) would you need in order to have
a 99% probability of finding a desired sequence represented in a library created by
digestion with a 6-cutter?
N = ln(1 - 0.99)/ln(1 - (4096/3x109))
N = 3.37 x 106 clones
13. Bacteriophage libraries
Insert size is larger -> Number of clones needed is smaller
Lytic and Lysogenic
Head, tail
Recombinant DNA
Assembly Protein
Cos site (200 bp long, nicked 12 bp overhang :
terminase)
Organism Genome size is 50 KB
Critical KB is required for Packaging
Vectors are of size 25KB
Upto 25 KB external DNA can be added
14. Step - 1
Large
Number
of Empty
heads and
tails
Infect Bacteria with Mutant phage
•Lacking critical size
•Lacking Assembly protein
Extract Empty Head and
Tails
16. Step -3
Add Packaging
enzyme
Mix Empty heads + Packaged viral
tails + Recombinant Particles
DNA
Transparent plaques: Made to Infect
Each one contains a Bacterial cells
fragment multiplied
Grow infected and
non-infected cells Transfection
17. Cosmid Libraries
Takes larger insert sizes
Can grow in bacteria or any other host
Needs an origin of replication
SV40 ori can grow in mammals
ColE1 in E.coli
18. BAC Libraries
Can take even larger insert sizes
Has origin of replication
Must have less copy numbers per cell.
•Partially digest chromosome
•Fraction select
•Clone it to a specialized plasmid
19. Various uses of BAC libraries
Physical mapping of genes
Cloning of valuable genes
Chromosome walking
BAC end sequencing
For gap filling in genome sequencing projects.
Powerful tools when used with genome sequencing data.
21. How Genomes are sequenced?
• Sanger Dideoxy Sequencing methods(1977)
• Maxam Gilberts Chemical degradation methods(1977)
• Two Labs that owned automated sequencers:
1. Leroy Hood at Caltech, 1986(commercialized by AB)
2. Wilhelm Ansorge at EMBL, 1986(commercialized by
Pharmacia-Amersham and GE healthcare)
3.Hypoxanthine-guanine phosphoribosyltransferase
(HGPRT)Alu sequences
4. Hitachi Laboratory developed High throughput
capillary array sequencer, 1996.1991, A patent filed by
EMBL on media less, solid support based sequencing.
22. How Genomes are sequenced?
• Sanger Dideoxy Sequencing methods(1977)
• Maxam Gilberts Chemical degradation methods(1977)
• Two Labs that owned automated sequencers:
1. Leroy Hood at Caltech, 1986(commercialized by AB)
2. Wilhelm Ansorge at EMBL, 1986(commercialized by
Pharmacia-Amersham and GE healthcare)
3.Hypoxanthine-guanine phosphoribosyltransferase
(HGPRT)Alu sequences
4. Hitachi Laboratory developed High throughput
capillary array sequencer, 1996.1991, A patent filed by
EMBL on media less, solid support based sequencing.
23. Sanger Di-deoxy method
Figures taken from
http://www.bio.davidson.edu/courses/bio111/seq.html
25. Application of Genome Sequencing
Prediction of novel genes/transcripts
Study of genome organization
Study of genome evolution
Relationship between organisms
Genetic basis of complex disease
Linkage analysis
Evolution of genes