SlideShare a Scribd company logo
1 of 32
3/24/2023
©Bud Mishra, 2001
L2-1
Computational Biology
Lecture #2: Genome
Organization
Bud Mishra
Professor of Computer Science and
Mathematics
9 ¦ 24 ¦ 2001

3/24/2023
©Bud Mishra, 2001
L2-2
Active Areas of
Research(1)
• Human Genome Project: (Completed?)
– Read 3 billion base pairs in 46 human
chromosomes
– Deemed “substantially completed on June 27,
2000.”
• Polymorphisms and Haplotyping
– SNPs (Single Nucleotide Polymorphisms): Catalog
the single base pair variations occurring about 1 in
800 base pairs of human genome over the entire
populations
– RFLP-Map: Restriction Fragment Length
Polymorphisms
3/24/2023
©Bud Mishra, 2001
L2-3
Active Areas of
Research(2)
• Transcription Maps:
– Identify all (about 30,000 (?)) the genes in the
human genome.
– Particularly interesting are the ones involved in
cancer…About 100 oncogenes and 1000 tumor
suppressor genes
• Linkage Analysis:
– Relate genes (or polymorphic markers) to
phenotypes (externally observable traits) by
analyzing genomes of a family (kinship) or over a
population.
3/24/2023
©Bud Mishra, 2001
L2-4
Active Areas of
Research(3)
• Functional Genomics:
– Understand how an interactive network of
genes affect a chain of metabolic pathways
to ultimately determine the phenotypes
• Comparative Genomics:
– Relate genes within and across species to
understand their evolutionary
relationship…Phylogeny.
3/24/2023
©Bud Mishra, 2001
L2-5
Active Areas of
Research(4)
• Cell Informatics:
– Interaction between proteins (membrane
and soluble ones) to determine the
dynamics of a cell.
– Interaction among a heterogeneous
population of cells.
• Rational Drug Design:
– Design of drugs and delivery systems to
modify the dynamics of the cells.
3/24/2023
©Bud Mishra, 2001
L2-6
Introduction to Biology
• Genome:
– Hereditary information of an organism is encoded
in its DNA and enclosed in a cell (unless it is a
virus). All the information contained in the DNA of
a single organism is its genome.
• DNA molecule can be thought of as a very long
sequence of nucleotides or bases:
S = {A, T, C, G}
3/24/2023
©Bud Mishra, 2001
L2-7
Complementarity
• DNA is a double-stranded polymer and should be thought of as
a pair of sequences over S. However, there is a relation of
complementarity between the two sequences:
– A , T, C , G
– That is if there is an A (respectively, T, C, G) on one
sequence at a particular position then the other sequence
must have a T (respectively, A, G, C) at the same position.
• We will measure the sequence length (or the DNA length) in
terms of base pairs (bp): for instance, human (H. sapiens) DNA
is 3.3 £ 109 bp measuring about 6 ft of DNA polymer completely
stretched out!
3/24/2023
©Bud Mishra, 2001
L2-8
The Central Dogma
• The intermediate molecule carrying the information out of the nucleus
of an eukaryotic cell is RNA, a single stranded polymer.
• RNA also controls the translation process in which amino acids are
created making up the proteins.
• The central dogma(due to Francis Crick in 1958) states that these
information flows are all unidirectional:
“The central dogma states that once `information' has passed into
protein it cannot get out again. The transfer of information from
nucleic acid to nucleic acid, or from nucleic acid to protein, may be
possible, but transfer from protein to protein, or from protein to
nucleic acid is impossible. Information means here the precise
determination of sequence, either of bases in the nucleic acid or of
amino acid residues in the protein.”
3/24/2023
©Bud Mishra, 2001
L2-9
Interrupted Genes:
• An open reading frame (containing a
gene) consists of
– INTRONS: Intervening sequences a
Noncoding regions
– EXONS: Protein coding regions
• Introns are abundant in eukaryotes and
certain animal viruses.
3/24/2023
©Bud Mishra, 2001
L2-10
Interrupted Genes:
Intron1 Intron2
Intron3
Exon1 Exon2
Transcription
Splicing
DNA
RNA
Primary transcript
mRNA
3/24/2023
©Bud Mishra, 2001
L2-11
Interrupted Genes:
• Introns can occur between individual
codons or within a single codon
Nucleu
s
Cell
hnRNA
(heterogeneous nuclear
RNA)
Mixture of primary
transcripts with varying
numbers of introns spliced.
mRNA
3/24/2023
©Bud Mishra, 2001
L2-12
Some Genes…
Gene Product Organism Exon
Length
#Intron Intron
Length
Adenoshine deaminase Human 1500 11 30,000
Apolipoprotein B Human 14,000 28 29,000
Erythropoietin Human 582 4 1562
Thyroglobulin Human 8500 = 40 100,000
a-interferon Human 600 0 0
Fibroin Silk Worm 18,000 1 970
Phaseolin French 1263 5 515
3/24/2023
©Bud Mishra, 2001
L2-13
Regulation of Gene
Expressions
• Motifs (short DNA sequences) that regulate transcription
– Promoter
– Terminator
• Motifs that modulate transcription
– Repressor
– Activator
– Antiterminator
Promoter
Gene
Transcription
al
Initiation
Transcription
al
Termination
Terminator
10-35bp
3/24/2023
©Bud Mishra, 2001
L2-14
Promoters
• pol I (RNA polymerase I)
– Transcribes ribosomal RNA genes 100 » 1000 bp
in front of the gene
• pol II (RNA polymerase II)
– Transcribes genes encoding polypeptides
– Complex and variable regulatory regions
• pol I (RNA polymerase I)
– Transcribes transfer RNA and other small RNAs
– Both up and down stream
3/24/2023
©Bud Mishra, 2001
L2-15
Motifs
• Each motif is a binding site for a specific protein
• Transcription Factor:
– Transcription factors (specific to a cell/environmental
conditions) bind to regulatory regions and facilitate
• Assembly of RNA polymerase into a transcriptional complex
• Activation of a transcriptional complex.
• Termination Factor:
– Assembly of proteins for termination and modification of the
end of the RNA
• Epigenetic Changes
– Methylation of the cytosine in the 5’ region
– Structural changes in cromatin
3/24/2023
©Bud Mishra, 2001
L2-16
Organization of Genetic Information
• Bacterial Genome:
– Genes are closely spaced along the DNA.
– The sequences of genes may overlap.
– Related genes (encoding enzymes whose
functions are part of the same pathway or
whose activities are related) are linked as a
single transcription unit.
3/24/2023
©Bud Mishra, 2001
L2-17
Organization of Genetic Information
• Eukaryotic Genome:
– Genes are separated by long stretches of
noncoding DNA sequences.
– Multiple genes in a single transcription unit
is extremely rare.
– Multiple chromosomes – Linear
– Chloroplasts and mitochondria – Circular
– Genes appearing on the same
chromosome are syntenic.
3/24/2023
©Bud Mishra, 2001
L2-18
Location of Some Genes on
Human Chromosome.
Genes chromosome
a-globin cluster 16
b-globin cluster 11
Immunoglobulin
k (light chain) 2
l (light chain) 22
Heavy Chain 14
Pseudogenes 9,32,15,18
Growth Hormone 17
Thymidine kinase 17
Genes chromosome
Insulin 11
Galactokinase 11
Viral oncogene
C-sis 22
C-mos 8
C-Ha-Ras-1 11
C-myb 6
Interferons
a & b luster 9
g 12
3/24/2023
©Bud Mishra, 2001
L2-19
Eukaryotic Genome
• Multiple copies of the same gene
– Solve “supply problem”
– There are several hundred robosomal RNA genes
I mammals
• Pseudogenes
– Nonfunctional copies of genes…(Deletions or
alterations in the DNA sequence)
– Number of pseudo genes for a particular gene
varies greatly…Different from one organism to
another.
3/24/2023
©Bud Mishra, 2001
L2-20
Genes in Eukaryotes
• A gene may appear exactly once
• It may be part of a family of repeated sequence .
Members of a family may be clustered or dispersed.
• Members of a gene family may be related and
functional (expressed at different times in
development, or in different cells) or may be pseudo
genes.
• Chromosomal Morphology:
– Nucleolar organizers (genes for ribosomal RNA)
– Telomeric and Centromeric regions (Tandemly repeated
sequences)
3/24/2023
©Bud Mishra, 2001
L2-21
The Rearrangement of DNA
Sequences
• Reshuffling of genes between homologous
chromosomes via reciprocal crossing-over
during both meiosis and mitosis.
• Gene synteny and linkages are usually
preserved.
• Most rearrangements are random.
• Some rearrangements are normal processes
altering gene expressions in an orderly and
programmed manner.
3/24/2023
©Bud Mishra, 2001
L2-22
Chromosomal Aberrations
• Breakage
• Translocation (Among non-homologous
chromosomes.)
• Formation of acentric and dicentric chromosomes.
• Gene Conversions
• Amplification an deletions
• Point mutations
• Jumping genes a Transposition of DNA segments
• Programmed rearrangements a E.g., antibody
responses.
3/24/2023
©Bud Mishra, 2001
L2-23
Repeat Structure
• Copy Number: 2 » 106
• Direct Repeats “head-to-tail”
– Tandem repeats or separated by other sequences
• Inverted Repeats “head-to-head”
– Stem-and-loop structure
– Hairpin structure
• Reverse Palindrome
• True Palindrome
3/24/2023
©Bud Mishra, 2001
L2-24
Repeat Structure
• Tandem Direct
Repeats
• Inverted Repeats
• Reverse Palindrome
• True Palindrome
5’-AAGAG AAGAG AAGAG-3’
5’-GTCCAGNL NCTGGAC-3’
CAGGTCNL NGACCTG
G C
A T
C G
C G
T A
G C
Stem-and-loop structure
Associated with inverted
repeats
5’-GAATTC-3’
CTTAAG
5’-GTCAATGA AGTAACTG-3’
3/24/2023
©Bud Mishra, 2001
L2-25
Repeats within the
Genome
• Gene Family
– Genes and its cognate pseudogenes
• Satellite: Repeats made of noncoding
units
– Minisatellites: Tandem repeats…Mostly in
centromeric regions
– Satellite repeat units vary in length freom 2
base pairs to several thousands.
3/24/2023
©Bud Mishra, 2001
L2-26
Interspersed Repeats
• SINES: Short Interspersed Repeats
– Each repeat unit is of length 100 – 500 bps
– Processed pseudogenes derived from
class III genes
– Example: Alu repeats…dimeric head-to-tail
repeats of 130 bp
• LINES: Long Interspersed Repeats
– Each unit is of length > 6 Kb.
3/24/2023
©Bud Mishra, 2001
L2-27
A Genome Grammar
• Consists of
– A stochastic grammar specifying target DNA
sequence together with
– A description of polymorphisms and
– A description of the sampling strategy for
experiments
• h specificationi ! h DNA-Seg i
h Poly-Seg i*
h Sample-Seg i+
3/24/2023
©Bud Mishra, 2001
L2-28
Stochastic Grammar
• h DNA-Seg i !
“.dna” h DNA-Spec i
• h Poly-Seg i !
“.poly” h Weight i+ h Poly-Spec i
• h Sample-Seg i !
“.sample” h Sample-spec i
3/24/2023
©Bud Mishra, 2001
L2-29
DNA Sequence
• .dna
A = 150 Ã sequence of length 150—
Pr(A) = Pr(T) = Pr(C) = Pr(G) = ¼
B = A A m(.30) Ã A followed by a mutated
copy of A---Pr(Mutation) = .30
C » 3-7 p(.2, .3, .3) Ã A string of length 3 to 7,
Pr(A) =.2, Pr(T) = .3, Pr(C)=.3, Pr(G) = .2
---C = Constant String
D = C m(0.03) n(10,30) Ã m = mutation rate,
n = copy number
• S = 30,000,000
B m(.05, .10) p(.1,.1,.01) n(10)
D !(500)
3/24/2023
©Bud Mishra, 2001
L2-30
Polymorphisms
• Modify the ancestral sequence by a series of
– S Point mutation (SNPs)
– D Deletions
– X Translocations
• .poly .8 .8
S 0.00012T
D 1-1 .00012
D 2-2 .00006
D 3-3 .00002
D 500-1000 .00005
X 1000-2000 .0005
.poly .4
S .001
D 1-2 .0005
Two haplotypes of .8 each and
one haplotype of weight .4
3/24/2023
©Bud Mishra, 2001
L2-31
Sampling
• .sample
48,000 Ã Number of Samples
400 600 .5 Ã Read Lengths
.01 .02 Ã Sequence Read Errors
.33 .33 Ã Failure of Read
.3 1800 2200 .005 Ã Clone size
• .sample
12,000
400 600 .5
.01 .03
.33 .33
.4 9000 11000 .015
3/24/2023
©Bud Mishra, 2001
L2-32
Experiment
• First sample generate 48,000 end reads from
inserts of average length 2 Kbp.
– Sample proportions: 40% from haplotype H1, 40%
from H2 and 20% from H3
• Second sample generates 12,000 end reads
from inserts of average length 10 Kbp.
– Sample proportions: 40% from haplotype H1, 40%
from H2 and 20% from H3

More Related Content

Similar to 02.cb1.ppt

Similar to 02.cb1.ppt (20)

Final Draft
Final DraftFinal Draft
Final Draft
 
IB Biology 3.1 Slides: Genes
IB Biology 3.1 Slides: GenesIB Biology 3.1 Slides: Genes
IB Biology 3.1 Slides: Genes
 
8 Basic Genetic Mechanisms
8 Basic Genetic Mechanisms 8 Basic Genetic Mechanisms
8 Basic Genetic Mechanisms
 
Molecular biology lecture
Molecular biology lectureMolecular biology lecture
Molecular biology lecture
 
Basic genetics ,mutation and karyotyping
Basic genetics ,mutation and karyotypingBasic genetics ,mutation and karyotyping
Basic genetics ,mutation and karyotyping
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Numbers in Life: A Statistical Genetic Approach
Numbers in Life: A Statistical Genetic ApproachNumbers in Life: A Statistical Genetic Approach
Numbers in Life: A Statistical Genetic Approach
 
Ajay TRANSPOSABLE GENETICS ASSIGNMENT.pptx
Ajay TRANSPOSABLE GENETICS ASSIGNMENT.pptxAjay TRANSPOSABLE GENETICS ASSIGNMENT.pptx
Ajay TRANSPOSABLE GENETICS ASSIGNMENT.pptx
 
Microbial genetics notes
Microbial genetics notesMicrobial genetics notes
Microbial genetics notes
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Organellar genome and its composition
Organellar genome and its compositionOrganellar genome and its composition
Organellar genome and its composition
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structure
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
Dna mapping
Dna mappingDna mapping
Dna mapping
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Organellar genome
Organellar genomeOrganellar genome
Organellar genome
 
Genomics
GenomicsGenomics
Genomics
 
Introduction
IntroductionIntroduction
Introduction
 
Molecular Biology
Molecular BiologyMolecular Biology
Molecular Biology
 

Recently uploaded

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 

02.cb1.ppt

  • 1. 3/24/2023 ©Bud Mishra, 2001 L2-1 Computational Biology Lecture #2: Genome Organization Bud Mishra Professor of Computer Science and Mathematics 9 ¦ 24 ¦ 2001 
  • 2. 3/24/2023 ©Bud Mishra, 2001 L2-2 Active Areas of Research(1) • Human Genome Project: (Completed?) – Read 3 billion base pairs in 46 human chromosomes – Deemed “substantially completed on June 27, 2000.” • Polymorphisms and Haplotyping – SNPs (Single Nucleotide Polymorphisms): Catalog the single base pair variations occurring about 1 in 800 base pairs of human genome over the entire populations – RFLP-Map: Restriction Fragment Length Polymorphisms
  • 3. 3/24/2023 ©Bud Mishra, 2001 L2-3 Active Areas of Research(2) • Transcription Maps: – Identify all (about 30,000 (?)) the genes in the human genome. – Particularly interesting are the ones involved in cancer…About 100 oncogenes and 1000 tumor suppressor genes • Linkage Analysis: – Relate genes (or polymorphic markers) to phenotypes (externally observable traits) by analyzing genomes of a family (kinship) or over a population.
  • 4. 3/24/2023 ©Bud Mishra, 2001 L2-4 Active Areas of Research(3) • Functional Genomics: – Understand how an interactive network of genes affect a chain of metabolic pathways to ultimately determine the phenotypes • Comparative Genomics: – Relate genes within and across species to understand their evolutionary relationship…Phylogeny.
  • 5. 3/24/2023 ©Bud Mishra, 2001 L2-5 Active Areas of Research(4) • Cell Informatics: – Interaction between proteins (membrane and soluble ones) to determine the dynamics of a cell. – Interaction among a heterogeneous population of cells. • Rational Drug Design: – Design of drugs and delivery systems to modify the dynamics of the cells.
  • 6. 3/24/2023 ©Bud Mishra, 2001 L2-6 Introduction to Biology • Genome: – Hereditary information of an organism is encoded in its DNA and enclosed in a cell (unless it is a virus). All the information contained in the DNA of a single organism is its genome. • DNA molecule can be thought of as a very long sequence of nucleotides or bases: S = {A, T, C, G}
  • 7. 3/24/2023 ©Bud Mishra, 2001 L2-7 Complementarity • DNA is a double-stranded polymer and should be thought of as a pair of sequences over S. However, there is a relation of complementarity between the two sequences: – A , T, C , G – That is if there is an A (respectively, T, C, G) on one sequence at a particular position then the other sequence must have a T (respectively, A, G, C) at the same position. • We will measure the sequence length (or the DNA length) in terms of base pairs (bp): for instance, human (H. sapiens) DNA is 3.3 £ 109 bp measuring about 6 ft of DNA polymer completely stretched out!
  • 8. 3/24/2023 ©Bud Mishra, 2001 L2-8 The Central Dogma • The intermediate molecule carrying the information out of the nucleus of an eukaryotic cell is RNA, a single stranded polymer. • RNA also controls the translation process in which amino acids are created making up the proteins. • The central dogma(due to Francis Crick in 1958) states that these information flows are all unidirectional: “The central dogma states that once `information' has passed into protein it cannot get out again. The transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein, may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.”
  • 9. 3/24/2023 ©Bud Mishra, 2001 L2-9 Interrupted Genes: • An open reading frame (containing a gene) consists of – INTRONS: Intervening sequences a Noncoding regions – EXONS: Protein coding regions • Introns are abundant in eukaryotes and certain animal viruses.
  • 10. 3/24/2023 ©Bud Mishra, 2001 L2-10 Interrupted Genes: Intron1 Intron2 Intron3 Exon1 Exon2 Transcription Splicing DNA RNA Primary transcript mRNA
  • 11. 3/24/2023 ©Bud Mishra, 2001 L2-11 Interrupted Genes: • Introns can occur between individual codons or within a single codon Nucleu s Cell hnRNA (heterogeneous nuclear RNA) Mixture of primary transcripts with varying numbers of introns spliced. mRNA
  • 12. 3/24/2023 ©Bud Mishra, 2001 L2-12 Some Genes… Gene Product Organism Exon Length #Intron Intron Length Adenoshine deaminase Human 1500 11 30,000 Apolipoprotein B Human 14,000 28 29,000 Erythropoietin Human 582 4 1562 Thyroglobulin Human 8500 = 40 100,000 a-interferon Human 600 0 0 Fibroin Silk Worm 18,000 1 970 Phaseolin French 1263 5 515
  • 13. 3/24/2023 ©Bud Mishra, 2001 L2-13 Regulation of Gene Expressions • Motifs (short DNA sequences) that regulate transcription – Promoter – Terminator • Motifs that modulate transcription – Repressor – Activator – Antiterminator Promoter Gene Transcription al Initiation Transcription al Termination Terminator 10-35bp
  • 14. 3/24/2023 ©Bud Mishra, 2001 L2-14 Promoters • pol I (RNA polymerase I) – Transcribes ribosomal RNA genes 100 » 1000 bp in front of the gene • pol II (RNA polymerase II) – Transcribes genes encoding polypeptides – Complex and variable regulatory regions • pol I (RNA polymerase I) – Transcribes transfer RNA and other small RNAs – Both up and down stream
  • 15. 3/24/2023 ©Bud Mishra, 2001 L2-15 Motifs • Each motif is a binding site for a specific protein • Transcription Factor: – Transcription factors (specific to a cell/environmental conditions) bind to regulatory regions and facilitate • Assembly of RNA polymerase into a transcriptional complex • Activation of a transcriptional complex. • Termination Factor: – Assembly of proteins for termination and modification of the end of the RNA • Epigenetic Changes – Methylation of the cytosine in the 5’ region – Structural changes in cromatin
  • 16. 3/24/2023 ©Bud Mishra, 2001 L2-16 Organization of Genetic Information • Bacterial Genome: – Genes are closely spaced along the DNA. – The sequences of genes may overlap. – Related genes (encoding enzymes whose functions are part of the same pathway or whose activities are related) are linked as a single transcription unit.
  • 17. 3/24/2023 ©Bud Mishra, 2001 L2-17 Organization of Genetic Information • Eukaryotic Genome: – Genes are separated by long stretches of noncoding DNA sequences. – Multiple genes in a single transcription unit is extremely rare. – Multiple chromosomes – Linear – Chloroplasts and mitochondria – Circular – Genes appearing on the same chromosome are syntenic.
  • 18. 3/24/2023 ©Bud Mishra, 2001 L2-18 Location of Some Genes on Human Chromosome. Genes chromosome a-globin cluster 16 b-globin cluster 11 Immunoglobulin k (light chain) 2 l (light chain) 22 Heavy Chain 14 Pseudogenes 9,32,15,18 Growth Hormone 17 Thymidine kinase 17 Genes chromosome Insulin 11 Galactokinase 11 Viral oncogene C-sis 22 C-mos 8 C-Ha-Ras-1 11 C-myb 6 Interferons a & b luster 9 g 12
  • 19. 3/24/2023 ©Bud Mishra, 2001 L2-19 Eukaryotic Genome • Multiple copies of the same gene – Solve “supply problem” – There are several hundred robosomal RNA genes I mammals • Pseudogenes – Nonfunctional copies of genes…(Deletions or alterations in the DNA sequence) – Number of pseudo genes for a particular gene varies greatly…Different from one organism to another.
  • 20. 3/24/2023 ©Bud Mishra, 2001 L2-20 Genes in Eukaryotes • A gene may appear exactly once • It may be part of a family of repeated sequence . Members of a family may be clustered or dispersed. • Members of a gene family may be related and functional (expressed at different times in development, or in different cells) or may be pseudo genes. • Chromosomal Morphology: – Nucleolar organizers (genes for ribosomal RNA) – Telomeric and Centromeric regions (Tandemly repeated sequences)
  • 21. 3/24/2023 ©Bud Mishra, 2001 L2-21 The Rearrangement of DNA Sequences • Reshuffling of genes between homologous chromosomes via reciprocal crossing-over during both meiosis and mitosis. • Gene synteny and linkages are usually preserved. • Most rearrangements are random. • Some rearrangements are normal processes altering gene expressions in an orderly and programmed manner.
  • 22. 3/24/2023 ©Bud Mishra, 2001 L2-22 Chromosomal Aberrations • Breakage • Translocation (Among non-homologous chromosomes.) • Formation of acentric and dicentric chromosomes. • Gene Conversions • Amplification an deletions • Point mutations • Jumping genes a Transposition of DNA segments • Programmed rearrangements a E.g., antibody responses.
  • 23. 3/24/2023 ©Bud Mishra, 2001 L2-23 Repeat Structure • Copy Number: 2 » 106 • Direct Repeats “head-to-tail” – Tandem repeats or separated by other sequences • Inverted Repeats “head-to-head” – Stem-and-loop structure – Hairpin structure • Reverse Palindrome • True Palindrome
  • 24. 3/24/2023 ©Bud Mishra, 2001 L2-24 Repeat Structure • Tandem Direct Repeats • Inverted Repeats • Reverse Palindrome • True Palindrome 5’-AAGAG AAGAG AAGAG-3’ 5’-GTCCAGNL NCTGGAC-3’ CAGGTCNL NGACCTG G C A T C G C G T A G C Stem-and-loop structure Associated with inverted repeats 5’-GAATTC-3’ CTTAAG 5’-GTCAATGA AGTAACTG-3’
  • 25. 3/24/2023 ©Bud Mishra, 2001 L2-25 Repeats within the Genome • Gene Family – Genes and its cognate pseudogenes • Satellite: Repeats made of noncoding units – Minisatellites: Tandem repeats…Mostly in centromeric regions – Satellite repeat units vary in length freom 2 base pairs to several thousands.
  • 26. 3/24/2023 ©Bud Mishra, 2001 L2-26 Interspersed Repeats • SINES: Short Interspersed Repeats – Each repeat unit is of length 100 – 500 bps – Processed pseudogenes derived from class III genes – Example: Alu repeats…dimeric head-to-tail repeats of 130 bp • LINES: Long Interspersed Repeats – Each unit is of length > 6 Kb.
  • 27. 3/24/2023 ©Bud Mishra, 2001 L2-27 A Genome Grammar • Consists of – A stochastic grammar specifying target DNA sequence together with – A description of polymorphisms and – A description of the sampling strategy for experiments • h specificationi ! h DNA-Seg i h Poly-Seg i* h Sample-Seg i+
  • 28. 3/24/2023 ©Bud Mishra, 2001 L2-28 Stochastic Grammar • h DNA-Seg i ! “.dna” h DNA-Spec i • h Poly-Seg i ! “.poly” h Weight i+ h Poly-Spec i • h Sample-Seg i ! “.sample” h Sample-spec i
  • 29. 3/24/2023 ©Bud Mishra, 2001 L2-29 DNA Sequence • .dna A = 150 Ã sequence of length 150— Pr(A) = Pr(T) = Pr(C) = Pr(G) = ¼ B = A A m(.30) Ã A followed by a mutated copy of A---Pr(Mutation) = .30 C » 3-7 p(.2, .3, .3) Ã A string of length 3 to 7, Pr(A) =.2, Pr(T) = .3, Pr(C)=.3, Pr(G) = .2 ---C = Constant String D = C m(0.03) n(10,30) Ã m = mutation rate, n = copy number • S = 30,000,000 B m(.05, .10) p(.1,.1,.01) n(10) D !(500)
  • 30. 3/24/2023 ©Bud Mishra, 2001 L2-30 Polymorphisms • Modify the ancestral sequence by a series of – S Point mutation (SNPs) – D Deletions – X Translocations • .poly .8 .8 S 0.00012T D 1-1 .00012 D 2-2 .00006 D 3-3 .00002 D 500-1000 .00005 X 1000-2000 .0005 .poly .4 S .001 D 1-2 .0005 Two haplotypes of .8 each and one haplotype of weight .4
  • 31. 3/24/2023 ©Bud Mishra, 2001 L2-31 Sampling • .sample 48,000 Ã Number of Samples 400 600 .5 Ã Read Lengths .01 .02 Ã Sequence Read Errors .33 .33 Ã Failure of Read .3 1800 2200 .005 Ã Clone size • .sample 12,000 400 600 .5 .01 .03 .33 .33 .4 9000 11000 .015
  • 32. 3/24/2023 ©Bud Mishra, 2001 L2-32 Experiment • First sample generate 48,000 end reads from inserts of average length 2 Kbp. – Sample proportions: 40% from haplotype H1, 40% from H2 and 20% from H3 • Second sample generates 12,000 end reads from inserts of average length 10 Kbp. – Sample proportions: 40% from haplotype H1, 40% from H2 and 20% from H3