C value

C-value
 The C-value of an organism is the amount of DNA in the
organism’s genome. The size of the genome (C-value)
depends on the organism.
 Thomas coined the term C-value Paradox to denote the
unexpected lack of relationship between the presumed
complexity of an organism and its C-value

4
Range of C-values in various eukaryotic taxa
_____________________________________________________________
__
Taxon Genome size range Ratio
(Kb) (highest/lowest)
_____________________________________________________________
__
Eukaryotes 2,300 - 686,000,000 298,261
Amoebae 35,300 - 686,000,000 19,433
Fungi 8,800 - 1,470,000 167
Animals 49,000 - 139,000,000 2,837
Sponges 49,000 - 53,900 1
Molluscs 421,000 - 5,290,000 13
Crustaceans 686,000 - 22,100,000 32
Insects 98,000 - 7,350,000 75
Bony fishes 340,000 - 139,000,000 409
Amphibians 931,000 - 84,300,000 91
Reptiles 1,230,000 - 5,340,000 4
Birds 1,670,000 - 2,250,000 1
Mammals 1,700,000 - 6,700,000 4
Plants 50,000 - 307,000,000 6,140
_____________________________________________________________
__

5
If the variation in C-values is attributed to
genes, it can be due to interspecific
differences in
(1) the number of protein-coding
genes
(2) the size of proteins
(3) the size of protein-coding genes
(4) the number and sizes of genes
other than protein-coding ones.

7
K-value paradox: Complexity
does not correlate with
chromosome number.
46 250
Ophioglossum
reticulatum
Homo sapiens Lysandra atlantica
~1260

8
C-value paradox: Complexity
does not correlate with
genome size.
3.4  109 bp
Homo sapiens
6.8  1011 bp
Amoeba dubia
1.5  1010 bp
Allium cepa

9
N-value paradox:
Complexity does not
correlate with gene number.
~21,000 genes ~25,000 genes ~60,000 genes

Figure 21.7 Exons (1.5%) Introns (5%)
Regulatory
sequences
(20%)
Unique
noncoding
DNA (15%)
Repetitive
DNA
unrelated to
transposable
elements
(14%)
Large-segment
duplications (56%)
Simple sequence
DNA (3%)
Alu elements
(10%)
L1
sequences
(17%)
Repetitive
DNA that
includes
transposable
elements
and related
sequences
(44%)

Sequence complexity- Introns
and Exons

 DNA comprises an interrupted gene are divided into the
two categories:
1. Exons
2. Introns
 Exons: are the sequences represented in the mature RNA.
By definition, a gene starts and ends with exons that
correspond to the 5’ and 3’ ends of the RNA.
 Introns: are the intervening sequences that are removed
when the primary transcript is processed to give the
mature RNA.

 The exon sequences are in the same order in the gene and in the
RNA, but an interrupted gene is longer than its final RNA product
because of the presence of the introns.
 Introns are removed by the process of RNA splicing, which occur
only in cis on an individual RNA molecule.

Sequences within the RNA
Determine Where Splicing Occurs
The borders between introns and exons are marked by
specific nucleotide sequences within the pre-mRNAs.

●5’splice site: the exon-intron boundary at the 5’
end of the intron
●3’ splice site: the exon-intron boundary at the 3’
end of the intron
●Branch point site: an A close to the 3’ end of the
intron, which is followed by a polypyrimidine tract
(Py tract).

Ⅱ The intron is removed in a Form Called a
Lariat as the Flanking Exons are joined
Two successive transesterification
Step 1: The OH of the conserved A at the branch site attacks
the phosphoryl group of the conserved G in the 5’ splice site.
As a result, the 5’ exon is released and the 5’-end of the
intron forms a three-way junction structure.
Step 2: The OH of the 5’ exon attacks the phosphoryl group at
the 3’ splice site. As a consequence, the 5’ and 3’ exons are
joined and the intron is liberated in the shape of a lariat.

The structure of three-way function

Three class of RNA Splicing
Class Abundance Mechanism Catalytic
Machinery
Nuclear
pre-
mRNA
Very common; used
for most eukaryotic
genes
Two
transesterificat
ion reactions;
branch site A
Major
spliceoso
me
Group II
introns
Rare; some eu-
Karyotic genes from
organelles and
prokaryotes
Same as pre-
mRNA
RNA
enzyme
encoded
by intron
(ribozyme)
Group I
introns
Rare; nuclear rRNA in
some eukaryotics,
organlle genes, and a
few prokaryotic genes
Two
transesterific-
ation reactions;
exogenous G
Same as
group II
introns

G instead of A
a linear
intron
a Lariat intron

 In yeast most genes are uninterrupted.
 In higher eukaryotes most genes are interrupted and the
introns are usually much longer than exons.
 When a gene is uninterrupted, the restriction map of its
DNA corresponds exactly with the map of its mRNA.
 When a gene possess an intron, the map at each end of
the gene corresponds with the map at each end of the
message sequence.

 Mutations that affect the splicing are usually
deleterious.
 The majority are single base substitutions at the
junctions between introns and exons.
 They may cause an exon to be left out of the product,
cause an intron to be included to make splicing occur at
an aberrant site.
 The most common result is to introduce a termination
codon that results in truncations of the protein
sequence.
 About 15% of the point mutations that cause human
diseases are caused by disruption of splicing.

 Introns can be detected by the presence of additional
regions when genes are compared with their RNA
products by restriction mapping or electron microscopy.
 The position of introns are usually conserved when
homologous genes are compared between different
organisms.
 The lengths of the corresponding introns may vary
greatly.
 Introns usually do not code for proteins.

 Comparisons of related genes in different species show
that the sequences of the corresponding exons are usually
conserved but the sequences of the introns are much less
well related.
 Introns evolve much more rapidly than exons because of
the lack of selective pressure to produce a protein with a
useful sequence.
 Exons are usually short, typically coding for <100 amino
acids.
 Introns are short in lower eukaryotes, but range up to
several 10s of kb in length in higher eukaryotes.
The overall length of a gene is determined largely by its
introns.

Some DNA Sequences Code for More Than One Protein
 The use of alternative initiation or termination codons allows
two proteins to be generated where one is equivalent to a
fragment of the other.
 Nonhomologous protein sequences can be produced from the
same sequence of DNA when it is read in different reading
frames by two (overlapping) genes.
 Homologous proteins that differ by the presence or absence of
certain regions can be generated by differential (alternative)
splicing when certain exons are included or excluded.
 This may take the form of including or excluding individual
exons or of choosing between alternative exons called Exon
shuffling.

There are five different ways to
alternatively splice a pre-mRNA

The outcome of alternative splicing
1. Producing multiple protein products,
called isoforms.
2. Switching on and off the expression of a
given gene. In this case, one functional
protein is produced by a splicing pattern,
and the non-functional proteins are
resulted from other splicing patterns.

Exons are shuffled by recombination to produce gene
encoding new proteins
All eukaryotes have introns, and yet these
elements are rare in bacteria. Two likely
explanations for these situation:
1. Introns early model – introns existed in all
organisms but have been lost from bacteria.
2. Intron late model – introns never existed in
bacteria but rather arose later in evolution.

Why have the introns been retained in eukaryotes?

1. The need to remove introns, allows for
alternative splicing which can generate
multiple proteins from a single gene.
2. Having the coding sequence of genes
divided into several exons allows new
genes to be created by reshuffling exon.

Three observations suggest exon shuffling actually
occur:
1. The borders between exons and introns within a
gene often coincide with the boundaries between
domains within the protein encoded by that gene.
2. Many genes, and proteins they encode, have
apparently arisen during evolution in part via exon
duplication and divergence.
3. Related exons are sometimes found in unrelated
genes.

Repeated sequences
 Repeated sequences (repetitive elements or repeats) are
patterns of nucleic acids (DNA or RNA) that occur in
multiple copies throughout the genome.
 Prokaryotes contain little or no repetitive sequences.
 Non coding repetitive DNA varies from one group of
organisms to another; individual to individual and
therefore used as DNA fingerprinting tool.

 3 major categories of repeated sequence based on
position
1. Terminal repeats
2. Tandem repeats-
3. Interspersed repeats

 Tandem repeats: copies which lie adjacent to each
other, either directly or inverted
 Satellite DNA - typically found in centromeres
and heterochromatin
 Minisatellite - repeat units from about 10 to 60 base pairs,
found in many places in the genome, including
the centromeres
 Microsatellite- repeat units of less than 10 base pairs; this
includes telomeres, which typically have 6 to 8 base pair
repeat units

Interspersed repeats (interspersed nuclear
elements)
 Transposable elements ( transposons or retroelements)
 SINEs (Short Interspersed Nuclear Elements)
 LINEs (Long Interspersed Nuclear Elements)
 In primates, the majority of LINEs are LINE-1 and the
majority of SINEs are Alu's.
 In prokaryotes, CRISPR are arrays of alternating
repeats and spacers.

a) Satellite DNA – first identified as distinct bands of DNA that
are heavier or lighter than the majority of genomic DNA by
density centrifugation.
 These are repeated sequences that have either high GC
(heavy) or high AT (light) content.
 They are fairly short sequences (2-2000 bp) repeated 1000’s
of times in a row. They are found in heterochromatic regions
and around centromeres.

b) Minisatellites
 sequences of 9-100 bp repeated 10-100 times.
 Found in subtelomeric regions and (rarely) dispersed
throughout chromosomes.

 c) Microsatellites (SRS “short repetitive sequences”, STR
“short tandem repeats”, SSR “simple sequence repeats”)
 very short sequences of 1-5 bp repeated 10-100 times.
 Found dispersed throughout chromosomes, often in and
around genes.
 For example, the dinucleotide repeat CA is very common
in the human genome (≈50,000 copies)

Example of a simple sequence repeat
(CCCA or GGGT) in human genomic DNA

 Microsatellites have very high mutation rates (where a
“mutation” means a change in repeat number).
 Thus they are often variable within a population and
useful for population genetics.
 This property also makes them useful for “DNA
fingerprinting”.


Retroposons
 Retroposons resemble processed RNAs and transpose
passively via RNA intermediate.
 Each element is composed of an A-rich tail at the 3' end
and short target site duplications (direct repeats of 5-21
bp) flanking the repeat.
 Two main subclasses dominate this class:
 Short Interspersed Elements (SINEs)
 Long Interspersed Elements (LINEs)

Short Interspersed Elements (SINEs)
 These are distributed throughout the non centromeric
regions of genome (over 100,000 copies per genome)
(Weiner, 1986).
 contains one or more RNA polymerase III, promoter sites
and an A-rich region.
 EX: Primate specific Alu sequence (5 to 9 kbp) with two
promoter sites and a dimer.

Long Interspersed Elements (LINEs)
 LINEs are composed open reading frames (ORFs) followed
by a 3' A-rich region having 20,000 to 50,000 copies per
genome (Hutchison et al., 1989; Weiner, 1986).
 Direct repeats of 6-15 bp flank the element.
 Ex: L1 family (primary LINE family) is 6 to 7 kbp long.

Single-copy genes Satellite DNA (highly
repetitive sequences)
A single-copy gene has one locatable
region on a DNA molecule.
Satellite DNA consists of highly repetitive
sequences that can repeat up to 100,000
times in various places on a DNA
molecule.
Single-copy genes make up 1–2% of the
human genome.
Satellite DNA constitutes more than
5% of the human genome.
A single-copy gene corresponds to a unit of
inheritance (i.e., a protein).
Satellite DNA is not involved with
inheritance.
Single-copy genes are transcribed to make
RNA, which in turn is translated to make a
protein.
Satellite DNA is not transcribed.
Single-copy genes are usually thousands of
base pairs in length.
Satellite DNA is typically between 5 and
300 base pairs per repeat.
Single-copy genes are less useful for DNA
profiling.
Satellite DNA has a high rate of mutation
making it useful for DNA profiling.

Role of repetitive sequences:
 Tandem repeat hyper variability enables identification of
genes e.g. antifreeze gene and several degenerative
diseases.
 Repeats may help in stability of transcripts or proteins but
repeat expansions and instability (particularly of
trinucleotide repeats) lead to neurological disorders and
cancer (Ashley and Warren, 1995; Mitas, 1997).
 Long stretch of CAG repeats translated into polyglutamine
tracts result in a gain-of-function, possibly a toxin (Perutz
et al., 1994; Baldi et al., 1999).

 CGG, AGG and TGG repeats form quadriplex and GAA
repeats form triplex structures that can block or reduce
transcription and DNA replication (Sinden, 1999).
 CGG repeats also destabilize nucleosomes (Sinden, 1999)
due to CpG hyper methylation leading to promoter
repression and lack of gene expression (Nelson 1995,
Baldi et al., 1999). On the other hand, CTG repeats
stabilize nucleosomes and block replication forks in E. Coli
(Sinden, 1999).

 Microsatellites have very high mutation rates (where a
“mutation” means a change in repeat number). Thus they
are often variable within a population and useful for
population genetics. This property also makes them
useful for “DNA fingerprinting”.

Functions of highly repetitive DNA sequences:
 Structural and organizational roles in chromosomes
 Involvement in chromosome pairing during meiosis
 Involvement in crossing over and recombination
 Protection of important structural genes like histone,
rRNA or ribosomal protein genes
 A repository of unessential DNA sequences for use in the
future evolution of the species and
 No function at all – just junk DNA that is carried along by
the processes of replication and segregation of
chromosomes.

Referencs :
 Brown T.A. 2002. Genomes. Wiley-LISS.
 Snustard and Simmons. Genetics
 Tamarin. Principles of Genetics.
 Lewin B. Genes IX.
 Related articles.

C value

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to C value

Similar to C value (20)

More from Vinod Pawar

More from Vinod Pawar (20)

Recently uploaded

Recently uploaded (20)

C value