2. GENOMICS
Field of biology that attempts to understand the content,
organization, function and evolution of genetic information
contained in the whole genome
Three Levels of Genome Research
3. Structural Genomics
EXON1 EXON2 EXON3
Structural genomics seeks to describe the structural features of
genes & 3-dimensional structure of every protein encoded by a
given genome
•Sequence and size in bp
•Map/ position
•Repeats
•Motifs/ domains etc
4. Functional genomics
Use of the vast wealth of data produced by genomic projects
(such as genome sequencing projects) to
describe gene (and protein) functions and interactions.
Study of Functional transcript and protein product encoded by
genes of a genome & their functional characterization
Role of gene
Time and tissue specific expression
Eg.- Gene encoding Florigen hormone in plants
5. What is Comparative Genomics?
Analyzing & comparing different genomes for
studying the gene content , function, organization &
evolution of different organism
Not a core technique but application of various techniques
of genome sequencing, mapping, bioinformatics etc with the sole
objective of comparing genomes
6. What to compare?
• Size of the genome: total number of base pairs
• Genome organization: circular, linear, ss/ ds genome, extra
chromosomal elements etc.
•Percentage of the genome (coding)
• Total number of predicted ORFs
• Average length of ORF
• Repetitive DNA/ junk DNA
• Functional assignment
8. Basic points of consideration in comparative
genomics
• All existing genomes had a common ancestor and that each
organism is a combination of ancestor and the action of
evolution.
• New genes are derived from existing sequences
• Information gained in one organism can have application in
other even distantly related organisms.
• The existing variation among current forms of living organisms
is due to
– Selection
– Speciation
– Divergence
– Gene mutations/ duplications etc
10. • Understanding the similarity & difference between the
genomes that lead to special phenotypes or diseases.
• Identifying genes and discovering their functions by studying
their counterparts in other organisms.
• Revealing the evolutionary relationships between different
organisms.
Purpose or Goals of Comparative Genomics
11. “Nothing in biology makes sense except in the light of evolution”
Theodosius Dobzhansky (1900 – 1975)
All modern biological processes evolved from related
processes.
Every modern gene evolved from other genes
Every gene has an ortholog in related species
most genes have paralogs in the same species.
12. Patterns of Gene Evolution
CASE I CASE II
Gene Orthologs Gene paralogs
Homologous genes
13. CASE III
Homologous genes
arise from common
ancestral organism
and show structural
similarity (sequence)
G1A & G2A are paralogs
G1A & G1B OR G2A G2B are orthologs
14. Great deal of information on an organism can be extracted by
examining their counterparts in simpler model organisms
Is conducted using model organisms
Model organisms offer a cost-effective way to follow the inheritance
of genes through many generations in a relatively short time.
Mammals: Homo sapiens, mouse
Insects: Drosophila melanogaster
Roundworms: C. elegans
Fungi: Saccharomyces cerevisiae
Bacteria: Escherichia coli
Fish: Zebrafish
Arabidopsis thaliana: Model plant
genome
Rice: Model cereal genome
Medicago trancatuala: Model tree
genome
How did it all start?
The Concept of Model organisms OR Model genomes
15. Genomes sequenced
First bacterial genomes sequenced
H.influenzae and M.genitalium
The yeast genome
1995
1996
1997
E.coli K12
1998
C.elegans
1999
Full sequence
of chr. 22
2000
D.melanogaster
Genome & Chr. 21
Human draft
2001
A.thaliana
•Mouse
•Ciona
•Rice
•Fugu
•Anopheles
2002
2003
Chimpanzee
2004
2005
•Human finished
•Rat
•Chicken
Xenopus
Zebrafish
16. Techniques of Comparative Genomics
• Use of molecular markers in plant genome analysis
• Comparative genome maps
• Studying Synteny
• Whole genome sequencing
• Bioinformatics tools:
– Homology search in public databases (BLASTn, BLASTp etc)
– Sequence alignment tools
• ClustalW
• ClustalX etc
17. Molecular markers in plant genome analysis
Comparative analysis of Aromatic rice varieties
18. Comparative maps for genome comparison
• Involves the use of molecular markers to map the genomes of
two species for a common set of markers (loci)
• To study genome evolution–how the genome has been
rearranged through time–and to make inferences about gene
organization, repeated sequences, etc
Overview of steps
• A map is constructed for a species using set of markers
• Align the map to reference map/ published map of related
species
• Work out common loci/ regions
19. Sorghum linkage map
produced from maize
RFLP genomic probes
Map location of of RFLP loci in maize
Comparative map
Maize Vs Sorghum
Comparative Genome Mapping of Sorghum and Maize
21. Synteny Maps for Comparative Genomics
Map showing syntenic regions and homologous loci from another
species aligned against a map of a target organism.
Can give us information about shared ancestry, evolutionary history, or
a key to functional relationships between genes.
Genes present in one species are likely to be present in closely-related
species.
Synteny Maps
Synteny : defined as the preservation of the order of genes on a
chromosome
22. Goff et al (2002 Science 296: 92-100)
Rice- Maize comparative
synteny map
Rice shows great synteny
with other cereals
i.e. genes present in one
cereal will almost certainly be
present in the same order in
another
Regions of homology
between rice and maize of
greater than 80%.
Virtually every part of the
maize genome finds a
homologue in rice
23. Approximately 99% of mouse genes have a homologs in the human
genome.
For 96% the homologue lies within a similar conserved syntenic interval
in the human genome.
Conservation of synteny between mouse
and human genomes
Mural et al., Science, 2002, 296:1661
Mouse chromosome 16 is syntenic with:
Chr.’s 3,8,12,16,21,22 of Humans
Chr.’s 10,11 of Rat
24. Comparison of mouse chromosome 16
and the human genome
Q: Why more breakpoints in mouse-
human than in mouse-rat?
Q: Why more conserved genes in
human than in rat?
• The longer the divergence time between 2
species, the more recombination has occurred
• 100 million years since human-mouse
divergence
• 40 million years since rat-mouse divergence
25. Bioinformatics Approaches:
Homology search in public databases
• Case: We have a gene/ protein sequence with no idea of its
function
• Subject sequence to homology/ sequence similarity search
across database (BLAST search)
• Look for the genes showing significant similarity
• Putative function can be inferred
Homology in bioinformatics = sequence similarity (expressed
in % sim/ identity)
BLAST: Basic Local Alignment Search Tool
26. An Example
Isolate the fragment
Get it sequenced
Subject the sequence to
BLAST
Get idea on the role/
function of the novel protein
Correlate with phenotype
and confirm the findings
M 1 2 3 4 5
5 rice varieties grown under heat stress
Isolate protein from individual plants
Analyze on SDS PAGE
29. Use of Sequence Alignment tools in comparative
genomics
Sequence alignment is a way of arranging the sequences to identify
regions of similarity that may be a consequence of functional, structural,
or evolutionary relationships between the sequences
SEQ1
SEQ2
SEQ1
SEQ2
Pairwise alignment
Multiple sequence alignment
Progressive multiple alignment techniques produce a phylogenetic tree
used to work out evolutionary relationship
30. No! This is the beginning of other advanced
comparative genomics approaches
What if target sequence shows no similarity with
existing sequences
Dead End ?
31. Prerequisites:
Enough data & tools for processing large amounts of data
Development of new computational methods
Advanced statistical tools
Knowledge of Algorithms
“Informatics” techniques from applied maths, computer science and
statistics adapted to biological sequences
Advanced Techniques of Comparative Genomics
32. • Genes of related function are associated in various ways
Enzymes in a pathway, proteins in a complex
• Group of genes having similar biochemical function tend to
remain localized
E.g. Genes required for synthesis of tryptophan (trp genes)
in E. coli and other prokaryotes
• Whatever a gene’s associates do, the gene probably does the
similar function
Prediction of functions via ‘guilt by association’
principle
Proverbial principle: Show me your friends and I’ll tell you
who you are’
33. Plant association studies
evidence is mainly post-
genomics studies
• Some plant post-genomic
resources:
• Microarray analysis
• Organellar targeting
prediction
• proteomics & phenomics
databases
Protein-protein
interactions
Organelle proteomes
Co-expression
Gene W
Gene X
Gene Y
Gene Z
Structures
Essentiality & other phenome data
A
B
C V M
A B C D
Orf XY
Orf YOrf X
Gene clustering
C
A
B
D
Gene fusion
Shared regulatory sites
XYYX
XYYX
XYYX
XYYX
Phylogenetic occurrence
+
+––
––
+
+
+
Genomic evidence Post-genomic evidence
Protein-protein
interactions
Organelle proteomes
Co-expression
Gene W
Gene X
Gene Y
Gene Z
Structures
Essentiality & other phenome data
A
B
A
B
C V M
A B C DA B C D
Orf XY
Orf YOrf X
Gene clustering
C
A
B
D
Gene fusion
Shared regulatory sites
XYYX
XYYX
XYYX
XYYX
Phylogenetic occurrence
+
+––
––
+
+
+
+
+
+
Genomic evidence Post-genomic evidence
38. • Locus Link/RefSeq
http://www.ncbi.nih.gov/LocusLink/
• PEDANT -Protein Extraction Description ANalysis Tool
http://pedant.gsf.de/
• MIPS –mammalian Protein Interaction Database
http://mips.gsf.de/
• COGs - Cluster of Orthologous Groups (of proteins)
http://www.ncbi.nih.gov/COG/
• KEGG - Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/
• MBGD - Microbial Genome Database –
http://mbgd.genome.ad.jp/
• GOLD - Genome OnLine Database –
http://www.genomesonline.org/
• TOGA (TIGR Orthologous Gene Alignment) –
http://www.tigr.org/tdb/toga/toga.shtml
General Databases Useful for Comparative
Genomics
39. What we have learned by comparing genomes
It will help us to understand the genetic basis of diversity in
organisms, both speciation & variation & important aspects of
evolutionary biology.
Provides “first pass” information on the function of the
putative gene based on the existence of conserved protein
sequences.
Comparative genomics provides a powerful way in which to
analyze sequence data.