SlideShare uma empresa Scribd logo
1 de 29
By A.Arputha Selvaraj
What we are going to talk
about
 Why we are doing all this DNA
sequencing
 What genes look like and where they are
found
 How we can compare sequences
between different species
 How genes move between species
DNA Sequencing
 Bioinformatics is based on the fact that
DNA sequencing is cheap, and
becoming easier and cheaper very
quickly.
 the Human Genome Project cost roughly
$3 billion and took 12 years (1991-2003).
 Sequencing James Watson’s genome in
2007 cost $2 million and took 2 months
 Today, you could get your genome
sequenced for about $100,000 and it
would take a month.
 The Archon X prize: you win $10 million if
you can sequence 100 human genomes in
10 days, at a cost of $10,000 per genome.
 It is realistic to envision $100 per genome
within 10 years: everyone’s genome could
be sequenced if they wanted or needed
it.
Why it’s useful
 All of the information needed to build an
organism is contained in its DNA. If we
could understand it, we would know how
life works.
 Preventing and curing diseases like cancer (which
is caused by mutations in DNA) and inherited
diseases.
 Curing infectious diseases (everything from AIDS
and malaria to the common cold). If we
understand how a microorganism works, we can
figure out how to block it.
 Understanding genetic and evolutionary
relationships between species
 Understanding genetic relationships between
humans. Projects exist to understand human
genetic diversity. Also, sequencing the
Neanderthal genome.
 Ancient DNA: currently it is thought that under ideal
conditions (continuously kept frozen), there is a limit of
about 1 million years for DNA survival. So, Jurassic Park
will probably remain fiction.
From DNA to Gene
 But: extracting that information is difficult. How to convert a
string of ACGT’s into knowledge of how the organism works is
hard.
 Most of the work is on the computer, with key confirming
experiments done in the “wet lab”.
 The sequence below contains a gene critical for life: the
gene that initiates replication of the DNA. Can you spot it?
 We are now going to spend some time on what genes look
like and how we can find them.
TTGGAAAACATTCATGATTTATGGGATAGAGCTTTAGATCAAATTGAAAAAAAATTAAGCAAACCTAGTTTTGAAACCTG
GCTCAAATCGACAAAAGCTCATGCTTTACAAGGAGACACGCTCATTATTACTGCACCTAATGATTTTGCACGGGACTGGT
TAGAATCTAGGTATTCTAATTTAATTGCTGAAACACTTTATGATCTTACGGGGGAAGAGTTAGATGTAAAATTTATTATT
CCTCCTAACCAGGCCGAGGAAGAATTCGATATTCAAACTCCTAAAAAGAAAGTCAATAAAGACGAAGGAGCAGAATTTCC
TCAAAGCATGCTAAATTCGAAGTATACCTTTGATACATTTGTTATCGGATCTGGAAATCGGTTTGCGCATGCAGCTTCTT
TAGCAGTAGCAGAAGCGCCGGCTAAAGCGTATAATCCGCTTTTTATTTACGGGGGAGTAGGATTAGGCAAAACACACTTA
ATGCACGCCATAGGCCACTATGTGTTAGATCATAATCCTGCCGCGAAAGTCGTGTACTTATCATCTGAAAAATTCACAAA
CGAGTTTATTAACTCTATTCGTGACAATAAAGCAGTAGAATTCCGCAACAAATACCGTAATGTAGATGTTTTACTGATTG
ATGATATTCAATTCTTAGCAGGTAAAGAGCAGACACAAGAAGAATTTTTCCATACGTTTAATACGCTTCACGAAGAAAGC
AAGCAGATTGTCATCTCAAGTGATCGACCGCCGAAAGAAATTCCTACACTTGAAGATCGACTTCGCTCTCGCTTTGAATG
GGGCCTTATTACAGACATCACACCACCAGATTTGGAAACACGAATTGCTATTTTGCGTAAAAAAGCCAAAGCGGACGGCT
TAGTTATTCCAAATGAAGTTATGCTTTATATCGCCAATCAGATTGATTCAAATATTAGAGAATTAGAAGGCGCACTTATT
DNA
 DNA is just a long string of 4
letters (nucleotides, or bases):
Adenine, Guanine, Cytosine,
and Thymine.
 Which we will just refer to as A,
C, G, and T
 and we are skipping lots of
details
 Each DNA molecule has 2
strands, with the bases paired
in the center
 A on one strand always pairs
with T on the other strand
 G pairs with C.
 the strands run in opposite
directions (like roads)
 Since the two DNA strands are
complementary, there is no
need to write down both
strands
Chromosomes and Genes
 each chromosome is a long piece of DNA
 B. megaterium genome is a circle (like most
bacteria) of about 5 million bases.
 Human chromosomes are 100-200 million bases
long. We have 46 chromosomes (2 sets of 23, one
set from each parent).
 genes are just regions on that DNA. It is not
obvious where genes are if you look at a DNA
sequence.
 there is a lot of DNA that is not part of genes: in
humans only 2% at most of the DNA is part of any
gene.
 Bacteria use more of their DNA: 80% of the B. meg
chromosome is genes.
 B. meg has about 1 gene per 1000 base pairs
(bp) of DNA. About 5000 genes
 Humans have about 25,000 genes.
 We are far more complicated than bacteria:
regulation of the genes is very complicated in
humans
 We use the same gene in different ways in different
tissues
Genes and Proteins
 Most genes code for proteins: each gene
contains the information necessary to
make one protein.
 Proteins are the most important type of
macromolecule.
 Structure: collagen in skin, keratin in hair,
crystallin in eye.
 Enzymes: all metabolic transformations,
building up, rearranging, and breaking
down of organic compounds, are done by
enzymes, which are proteins.
 Transport: oxygen in the blood is carried by
hemoglobin, everything that goes in or out
of a cell (except water and a few gasses) is
carried by proteins.
 Also: nutrition (egg yolk), hormones,
defense, movement
The Genetic Code
 Proteins are long chains of amino acids.
 There are 20 different amino acids coded in
DNA
 There are only 4 DNA bases, so you need 3
DNA bases to code for the 20 amino acids
 4 x 4 x 4 = 64 possible 3 base combinations
(codons)
 Each codon codes for one amino acid
 Most amino acids have more than one possible
codon
 Genes start at a start codon and end at a
stop codon.
 3 codons are stop codons: all genes end at a
stop codon.
 Start codons are a bit trickier, since they are
used in the middle of genes as well as at the
beginning
 in eukaryotes, ATG is always the start codon,
making Methionine (Met) the first amino acid in
all proteins (but in many proteins it is immediately
removed).
 In prokaryotes, ATG, GTG, or TTG can be used as
a start codon. B. meg prefers ATG, but about
30% of the genes start with GTG or TTG.
In bioinformatics, we generally
ignore the fact that RNA uses the
base uracil (U) in place of T.
Gene Expression
 How do you get a protein from a gene?
 A two-step process (called the Central
Dogma of Molecular Biology).
 First, the gene has to be copied (transcribed)
into an RNA form.
 The RNA copy (messenger RNA) is exactly
like the gene itself, except RNA replaces T
with U.
 Most gene regulation: whether the gene is
“on” or “off” happens here
 Second, the RNA is translated into protein by
ribosomes, which are complex RNA/protein
hybrid machines.
 With the help of transfer RNA molecules, which
have one end that matches the 3 base codon
and the other end that is attached to the proper
amino acid.
 The ribosome starts at the start codon and moves
down the messenger RNA, adding one amino
acid at a time to the growing chain. When the
ribosome reaches a stop codon, it falls off,
releasing the new protein.
Reading Frames
 Here we get a bit subtle.
 Since codons consist of 3 bases,
there are 3 “reading frames”
possible on an RNA (or DNA),
depending on whether you start
reading from the first base, the
second base, or the third base.
 The different reading frames give
entirely different proteins.
 Consider ATGCCATC, and refer to
the genetic code. (X is junk)
 Reading frame 1 divides this into ATG-
CCA-TC, which translates to Met-Pro-X
 Reading frame 2 divides this into A-
TGC-CAT-C, which translates to X-Cys-
His-X
 Reading frame 3 divides this into AT-
GCC-ATC, which translates to X-Ala-Ile
 Each gene uses a single reading
frame, so once the ribosome gets
started, it just has to count off
groups of 3 bases to produce the
proper protein.
Open Reading Frames
 Ribosomes are very obedient to stop codons:
when a stop codon is reached, the protein is
finished. Thus, all genes end at the first stop
codon in their reading frame.
 Since 3 out of the 64 codons are stop codons,
random DNA has stop codons very frequently.
 However, genes do something necessary for survival,
so natural selection keeps stop codons out of the
middle of genes.
 That is, if a mutation arises that creates a stop codon in the
middle of a gene, the organism dies and leaves no
descendants.
 Open reading frames (ORFs) are regions with no stop
codons. All genes reside in long open reading frames
 Note that stop codons in other reading frames have
no effect on the gene.
 The start codon must occur “upstream” in the
same reading frame as the stop codon. It is
usually near the beginning of the ORF, but not
necessarily the first possible start codon.
 Determining the exact start codon is not easy or
obvious.
 But, the first stop codon in an open reading frame is
always a reasonable guess
This is a map of the stop
codons in all 3 reading
frames in a stretch of DNA.
The long ORF in reading frame
1 is highlighted in black.
Gene Placement
 Genes can occur on either DNA strand.
 If they are on the reverse strand, the DNA sequence needs to be
reversed and complemented
 In bacteria, most of the DNA is part of a gene. Most long open
reading frames (say 100 bp or longer) that don’t overlap other
long ORFs contain genes
 Most genes do not overlap each other.
 Sometimes there are very short overlaps (50 bp or less), especially if
the two genes are functionally related.
 In bacteria, genes that affect the same biochemical pathway
or function are sometimes adjacent to each other on the same
DNA strand (not necessarily the same reading frame), allowing
them to be co-regulated
 This group of genes is called an “operon”
 Operons only exist in bacteria; they are not present in eukaryotes at
all.
Finding Genes
 First job is to find long ORFs, examining the longest ORFs first and
putting together a set with minimal overlaps.
 It is also necessary to identify potential start codons, with the furthest
upstream start codon as the easiest choice.
 Then, how do we know that the ORF contains a real gene? The
most definitive way is to match it with a gene known from other
species
 conservation of a sequence between species strongly suggests that
the sequence has a function that is being conserved by natural
selection
 We compare protein sequences, not DNA, because protein is
more conserved in evolution than DNA
 The organism’s survival depends on the protein being functional,
which means having the proper amino acids sequence
 Since the genetic code is degenerate, many different DNA sequences
will give identical proteins.
 The protein 3-dimensional structure is even more conserved, because
it is more closely related to enzyme activity than the amino acid
sequence is.
 However, we don’t have good ways of determining 3-D structure
from a DNA sequence
Sequence Comparison
 So, we compare our ORF sequence to a database of
known protein sequences from many species.
 BLAST is the standard sequence alignment tool (BLAST = Basic
Local Alignment Search Tool)
 BLAST is based on the concept that if you compare the
same (that is, homologous) protein from many different
species, you can see that some amino acids readily
substitute for each other and others almost never do.
 A substitution matrix, giving a score for each amino acid
position in the proteins being compared.
Practical BLAST
 BLAST itself is a bit of software that can be run on almost
any computer, but the database needed for a good cross-
species comparison is quite large
 the database is called “nr” for “non-redundant”, and it contains
at least 20 Gb of sequence data
 We are going to use the BLAST service at UniProt, a
European consortium that contains a comprehensive
collection of protein sequences
 http://www.uniprot.org/
 Nearly all derived from DNA sequences: direct sequencing of
proteins is difficult
 Terminology: your sequence, which you paste into the box
on the web site, is the query sequence. Sequences in the
database that match yours are called subject sequences.
A Sequence to BLAST
 This is a more-or-less
randomly chosen gene
from B. meg.
 It is 174 amino acids long
 It is written in “fasta”
format: the first line
starts with > and is
immediately followed
by an identifier
(ORF00135), and then
some miscellaneous
comments.
 After that the sequence
is written without spaces
or other marks.
>ORF00135 |chromosome
538197-538721 revcomp
MKAKLIQYVYDAECRLFKSVN
QHFDRKHLNRFLRLLTHAGGA
TFTIVIACLLLFLYPSSVAYA
CAFSLAVSHIPVAIAKKLYPR
KRPYIQLKHTKVLENPLKDHS
FPSGHTTAIFSLVTPLMIVYP
AFAAVLLPLAVMVGISRIYLG
LHYPTDVMVGLILGIFSGAVA
LNIFLT
Results
BLAST Scores
 Results are arranged with the best ones on top
 The most important score is the Expect value, or E-value, which can be
defined the number of hits any random sequence (with the same length
as yours) would have in the database.
 E-values for good hits are usually written something like: 3e-42, which is
the same as 3 x 10-42
, a very small number
 Bad hits are very common, and they have e-values in a more familiar
form: for example, 0.004 or 1.2
 A really good e-values is less than 1e-180, which underflows the
computer’s processing capabilities, so it written as 0.0
 E-values are affected by the length of the query sequence as well as
the size of the database, so even perfect matches with short
sequences give poor e-values
 In this case we see many hits with good e-values, and the top e-values all
are quite similar.
 Before we can conclude that our protein is a homologue of the proteins
BLAST matches it with, we would like them to have roughly the same
length and have a high percentage of identical amino acids.
 the lengths of the query and subject sequences should be within 20%
of each other
 There should be at least 30% identical amino acids
 In this case we can be quite sure we have a good match
 BLAST also returns a fourth value, the bit score, which we are going to
ignore.
Gene Names
 Mostly genes are named with the function of their protein.
 at some point, some related genes had their function determined through
lab work: by examining the effects of mutations in the gene, by isolating
and studying the protein produced by the gene, etc.
 Enzymes (end in –ase), transport across the cell membrane, genetic
information processing (DNA->RNA->protein), structural proteins, sporulation
and germination, and more!
 Many genes (maybe 1/4 of them in a typical genome) have no known
function, although they are found in several different species: conserved
hypothetical genes
 Every new genome has some genes that are unique: no matching BLAST hits in
the database.
 Are they real genes? Sometimes there is evidence in the form of messenger
RNA, but usually we don’t know
 call them hypothetical genes
 “putative” means that we think we know the gene’s function but we aren’t
sure. Putative should be followed by the function name.
More Gene Names
 One question of interest: do the names of the top
BLAST hits agree with each other? They should, but
there are always annotation errors, and our
knowledge of gene function increases over time.
 With some sloppiness due to different naming conventions
practiced by different scientists
 Here we have a classic case of mis-naming. Why is
the top hit ribosomal protein S2, with no other hit
having this name?
 Ribosomal proteins are highly conserved in evolution
 Some checking on my part showed that no homology
exists between this gene and the ribosomal protein S2
found in any other Bacillus species
 The other names are similar, although not identical.
 What is “PAP2”? A quick Google search shows that it
stands for “phosphatidic acid phosphatase”, which fits the
other names well.
 There is probably some uncertainty about its exact
function, given the variety of names and the “family
protein” designation in several of them.
Horizontal and Vertical Gene Transfer
 We are accustomed to thinking of genes
being passed from parent to offspring,
always staying within the species, with very
occasional splitting of one species into two.
 This is called vertical gene transfer.
 But, we know that some genes are
transferred across species lines, not by the
standard genetic mechanisms.
 This is called horizontal gene transfer
 It is rare in humans and other higher organisms
 In bacteria 10% or more of genes have been
transferred in horizontally.
 B meg genes that come from vertical
descent have other Bacillus species (or
another closely related species) as the
closest BLAST hit
 Horizontally transferred genes can come
from almost anywhere: other bacteria,
Archaea, eukaryotes: plants, animals, fungi
 The general mechanisms are well known,
including conjugation (direct transfer of DNA
between two bacteria), transduction (transfer
of DNA using a virus as a carrier), and
transformation (the bacteria pick up DNA
molecules from their environment.
Bacillus
Phylogeny
 “Kings Play Chess On
Fine Ground Sand”
 Bacteria is the domain
 Firmicutes is the
phylum
 Bacilli is the class
 Bacillales is the order
 Bacillaceae is the
family
 Bacillus is the genus.
Our Example
 Most of the top hits are from various Bacillus species: there is
little doubt that this gene is the results of normal, vertical
gene flow.
 What about “Anoxybacillus flavithermus”?
 Click on the accession number to get more information,
including its phylogeny.
 Taxonomic lineage = Bacteria > Firmicutes > Bacillales >
Bacillaceae > Anoxybacillus.
 Same family as B meg.
Aligned Sequences
 You can see the aligned sequences by
clicking on the “Local alignment” diagrams
 Query sequence on top, subject below
 Identical amino acids are in the middle of the
alignment, and similar ones have a + sign.
 Gaps: regions where one sequence has amino acids
not found in the other sequence, are indicated with
---.
 This protein is very typical in that the best
matches are in the middle of the protein, with
fewer identical amino acids near the ends.
 Also, the match doesn’t quite make it to the very
beginning of the proteins, although they are almost
identical in length.
 The active site of most enzymes is in the middle
 The ends of proteins are often not well conserved
Local Alignment Result
Graphical
Overview
 Click on Graphical Overview (just
under the BLAST box on the left) to
get an overview of all the aligned
sequences
 The extent of the matching region is
shown with the colored boxes, with
non-matching regions drawn as a
line.
 Color indicates percent of identical
amino acids
 You can see that mostly our query
and the various subjects (matches)
line up along almost all of their
lengths.
 This is a good way to check whether
our start site is reasonable.
 A few odd ones lower down.
 Genes, and pieces of genes, can
move to new locations in the
genome, fuse with other genes,
break apart, etc. Always subject to
natural selection: if the altered gene
doesn’t work, the organism will die
and we won’t see it.
 And of course, sequencing and
annotation errors occur.
The Basic Points
1. DNA can be read in 3 different reading frames,
a consequence of the genetic code (3 bases
= 1 amino acid)
2. Genes are found in long open reading frames,
areas where there are no stop codons.
3. BLAST is the tool we use to compare sequences
between species
• BLAST scores (e-values) describe the probability of
finding a random sequence in the database
1. Gene sequences are conserved between
species by natural selection
• DNA sequences outside of genes are much less
conserved
1. Most genes are transferred vertically, from
parent to offspring, but a significant number
are transferred horizontally, from unrelated
species).
Thank You
 Email me : arputhaselvaraj@gmail.com

Mais conteúdo relacionado

Mais procurados

Gene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaGene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaAmit Rulhania
 
Plant transposable elements where genetics meets genomics
Plant transposable elements where genetics meets genomicsPlant transposable elements where genetics meets genomics
Plant transposable elements where genetics meets genomicsBowiyaKS
 
Transposons or Jumping Genes or Transposable Elements
 Transposons or Jumping Genes or Transposable Elements Transposons or Jumping Genes or Transposable Elements
Transposons or Jumping Genes or Transposable ElementsAhsanAliRana
 
TRANSPOSABLE ELEMENTS
TRANSPOSABLE   ELEMENTSTRANSPOSABLE   ELEMENTS
TRANSPOSABLE ELEMENTSseetugulia
 
Bacterial Transposons
Bacterial TransposonsBacterial Transposons
Bacterial Transposonsguest06ad101
 
Unit 1 genetics nucleic acids dna
Unit 1 genetics nucleic acids dnaUnit 1 genetics nucleic acids dna
Unit 1 genetics nucleic acids dnaLondeka Mkhize
 
Gene families and clusters
Gene families and clusters Gene families and clusters
Gene families and clusters vidyadeepala
 
Reeta yadav. roll no. 01. transposable elements in prokaryotes
Reeta yadav. roll no. 01. transposable elements in prokaryotesReeta yadav. roll no. 01. transposable elements in prokaryotes
Reeta yadav. roll no. 01. transposable elements in prokaryotesManisha Jangra
 
TRANSPOSONS ;THE JUMPING GENES
TRANSPOSONS ;THE JUMPING GENESTRANSPOSONS ;THE JUMPING GENES
TRANSPOSONS ;THE JUMPING GENESvirgo_az
 
Transposable elements
Transposable elementsTransposable elements
Transposable elementsShreya Feliz
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppttauseefsko
 

Mais procurados (20)

Gene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaGene and Genome by Amit Rulhania
Gene and Genome by Amit Rulhania
 
Transposons
TransposonsTransposons
Transposons
 
Plant transposable elements where genetics meets genomics
Plant transposable elements where genetics meets genomicsPlant transposable elements where genetics meets genomics
Plant transposable elements where genetics meets genomics
 
Transposons or Jumping Genes or Transposable Elements
 Transposons or Jumping Genes or Transposable Elements Transposons or Jumping Genes or Transposable Elements
Transposons or Jumping Genes or Transposable Elements
 
Transposable Elements
Transposable ElementsTransposable Elements
Transposable Elements
 
TRANSPOSONS
TRANSPOSONSTRANSPOSONS
TRANSPOSONS
 
TRANSPOSABLE ELEMENTS
TRANSPOSABLE   ELEMENTSTRANSPOSABLE   ELEMENTS
TRANSPOSABLE ELEMENTS
 
Bacterial Transposons
Bacterial TransposonsBacterial Transposons
Bacterial Transposons
 
Unit 1 genetics nucleic acids dna
Unit 1 genetics nucleic acids dnaUnit 1 genetics nucleic acids dna
Unit 1 genetics nucleic acids dna
 
Gene families and clusters
Gene families and clusters Gene families and clusters
Gene families and clusters
 
Transposons
Transposons Transposons
Transposons
 
Reeta yadav. roll no. 01. transposable elements in prokaryotes
Reeta yadav. roll no. 01. transposable elements in prokaryotesReeta yadav. roll no. 01. transposable elements in prokaryotes
Reeta yadav. roll no. 01. transposable elements in prokaryotes
 
TRANSPOSONS ;THE JUMPING GENES
TRANSPOSONS ;THE JUMPING GENESTRANSPOSONS ;THE JUMPING GENES
TRANSPOSONS ;THE JUMPING GENES
 
Transposon
TransposonTransposon
Transposon
 
Gene structure and expression
Gene structure and expressionGene structure and expression
Gene structure and expression
 
Transposable elements
Transposable elementsTransposable elements
Transposable elements
 
Genome
GenomeGenome
Genome
 
Transposable elements
Transposable elementsTransposable elements
Transposable elements
 
TRANSPOSABLE ELEMENTS
TRANSPOSABLE ELEMENTSTRANSPOSABLE ELEMENTS
TRANSPOSABLE ELEMENTS
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
 

Destaque

19 juin comité de pilotage rsa majoré
19 juin   comité de pilotage rsa majoré19 juin   comité de pilotage rsa majoré
19 juin comité de pilotage rsa majoréLaurence Masson
 
OPAU
OPAUOPAU
OPAUOPAU
 
Brothers at peace 2 for sale
Brothers at peace 2   for saleBrothers at peace 2   for sale
Brothers at peace 2 for saleDavid Kelly
 
Lesson plan
Lesson planLesson plan
Lesson planOnfokn
 
Audience Classification Bart
Audience Classification BartAudience Classification Bart
Audience Classification Barthaverstockmedia
 
Psp vita, N3Ds, nvidia comparison
Psp vita, N3Ds,  nvidia comparisonPsp vita, N3Ds,  nvidia comparison
Psp vita, N3Ds, nvidia comparisonJordanianmc
 
2 r t de sjf agradecimientos
2 r t de sjf  agradecimientos2 r t de sjf  agradecimientos
2 r t de sjf agradecimientosJulio Ceballos
 

Destaque (8)

19 juin comité de pilotage rsa majoré
19 juin   comité de pilotage rsa majoré19 juin   comité de pilotage rsa majoré
19 juin comité de pilotage rsa majoré
 
OPAU
OPAUOPAU
OPAU
 
Brothers at peace 2 for sale
Brothers at peace 2   for saleBrothers at peace 2   for sale
Brothers at peace 2 for sale
 
Lesson plan
Lesson planLesson plan
Lesson plan
 
Audience Classification Bart
Audience Classification BartAudience Classification Bart
Audience Classification Bart
 
Psp vita, N3Ds, nvidia comparison
Psp vita, N3Ds,  nvidia comparisonPsp vita, N3Ds,  nvidia comparison
Psp vita, N3Ds, nvidia comparison
 
Data kuantitatif mt
Data kuantitatif mtData kuantitatif mt
Data kuantitatif mt
 
2 r t de sjf agradecimientos
2 r t de sjf  agradecimientos2 r t de sjf  agradecimientos
2 r t de sjf agradecimientos
 

Semelhante a Bioinformatics

Terminology related to genetics
Terminology related to geneticsTerminology related to genetics
Terminology related to geneticsenamifat
 
Basic genetics /certified fixed orthodontic courses by Indian dental academy
Basic genetics   /certified fixed orthodontic courses by Indian dental academy Basic genetics   /certified fixed orthodontic courses by Indian dental academy
Basic genetics /certified fixed orthodontic courses by Indian dental academy Indian dental academy
 
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and TranslationChapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translationj3di79
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structureSujan Karki
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3stanislas547
 
trnspsns-170820132104.pdf
trnspsns-170820132104.pdftrnspsns-170820132104.pdf
trnspsns-170820132104.pdfAnukrittiMehra
 
Differentiated Fern Research Paper
Differentiated Fern Research PaperDifferentiated Fern Research Paper
Differentiated Fern Research PaperAlison Reed
 
Terminology related to molecular biology and genetics
Terminology related to molecular biology and geneticsTerminology related to molecular biology and genetics
Terminology related to molecular biology and geneticsenamifat
 
Microbial Genetics
Microbial GeneticsMicrobial Genetics
Microbial GeneticsRoshni Mehta
 
Microbial Genetics
Microbial GeneticsMicrobial Genetics
Microbial GeneticsRoshni Mehta
 

Semelhante a Bioinformatics (20)

genetics.pptx
genetics.pptxgenetics.pptx
genetics.pptx
 
Terminology related to genetics
Terminology related to geneticsTerminology related to genetics
Terminology related to genetics
 
Genomics
GenomicsGenomics
Genomics
 
Basic genetics /certified fixed orthodontic courses by Indian dental academy
Basic genetics   /certified fixed orthodontic courses by Indian dental academy Basic genetics   /certified fixed orthodontic courses by Indian dental academy
Basic genetics /certified fixed orthodontic courses by Indian dental academy
 
C value
C value C value
C value
 
Genes in Action
Genes in ActionGenes in Action
Genes in Action
 
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and TranslationChapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
Chapter 20 Molecular Genetics Lesson 2 - Genes_Transcription and Translation
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structure
 
Genome structure
Genome structure Genome structure
Genome structure
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3
 
trnspsns-170820132104.pdf
trnspsns-170820132104.pdftrnspsns-170820132104.pdf
trnspsns-170820132104.pdf
 
Transposons in bacteria
Transposons in bacteriaTransposons in bacteria
Transposons in bacteria
 
Differentiated Fern Research Paper
Differentiated Fern Research PaperDifferentiated Fern Research Paper
Differentiated Fern Research Paper
 
Concept of Gene.pdf
Concept of Gene.pdfConcept of Gene.pdf
Concept of Gene.pdf
 
Basic Biocomputing
Basic BiocomputingBasic Biocomputing
Basic Biocomputing
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Terminology related to molecular biology and genetics
Terminology related to molecular biology and geneticsTerminology related to molecular biology and genetics
Terminology related to molecular biology and genetics
 
0.PDF
0.PDF0.PDF
0.PDF
 
Microbial Genetics
Microbial GeneticsMicrobial Genetics
Microbial Genetics
 
Microbial Genetics
Microbial GeneticsMicrobial Genetics
Microbial Genetics
 

Último

Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...narwatsonia7
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...rajnisinghkjn
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 

Último (20)

Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
Housewife Call Girls Hsr Layout - Call 7001305949 Rs-3500 with A/C Room Cash ...
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
Dwarka Sector 6 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few Cl...
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 

Bioinformatics

  • 2. What we are going to talk about  Why we are doing all this DNA sequencing  What genes look like and where they are found  How we can compare sequences between different species  How genes move between species
  • 3. DNA Sequencing  Bioinformatics is based on the fact that DNA sequencing is cheap, and becoming easier and cheaper very quickly.  the Human Genome Project cost roughly $3 billion and took 12 years (1991-2003).  Sequencing James Watson’s genome in 2007 cost $2 million and took 2 months  Today, you could get your genome sequenced for about $100,000 and it would take a month.  The Archon X prize: you win $10 million if you can sequence 100 human genomes in 10 days, at a cost of $10,000 per genome.  It is realistic to envision $100 per genome within 10 years: everyone’s genome could be sequenced if they wanted or needed it.
  • 4. Why it’s useful  All of the information needed to build an organism is contained in its DNA. If we could understand it, we would know how life works.  Preventing and curing diseases like cancer (which is caused by mutations in DNA) and inherited diseases.  Curing infectious diseases (everything from AIDS and malaria to the common cold). If we understand how a microorganism works, we can figure out how to block it.  Understanding genetic and evolutionary relationships between species  Understanding genetic relationships between humans. Projects exist to understand human genetic diversity. Also, sequencing the Neanderthal genome.  Ancient DNA: currently it is thought that under ideal conditions (continuously kept frozen), there is a limit of about 1 million years for DNA survival. So, Jurassic Park will probably remain fiction.
  • 5. From DNA to Gene  But: extracting that information is difficult. How to convert a string of ACGT’s into knowledge of how the organism works is hard.  Most of the work is on the computer, with key confirming experiments done in the “wet lab”.  The sequence below contains a gene critical for life: the gene that initiates replication of the DNA. Can you spot it?  We are now going to spend some time on what genes look like and how we can find them. TTGGAAAACATTCATGATTTATGGGATAGAGCTTTAGATCAAATTGAAAAAAAATTAAGCAAACCTAGTTTTGAAACCTG GCTCAAATCGACAAAAGCTCATGCTTTACAAGGAGACACGCTCATTATTACTGCACCTAATGATTTTGCACGGGACTGGT TAGAATCTAGGTATTCTAATTTAATTGCTGAAACACTTTATGATCTTACGGGGGAAGAGTTAGATGTAAAATTTATTATT CCTCCTAACCAGGCCGAGGAAGAATTCGATATTCAAACTCCTAAAAAGAAAGTCAATAAAGACGAAGGAGCAGAATTTCC TCAAAGCATGCTAAATTCGAAGTATACCTTTGATACATTTGTTATCGGATCTGGAAATCGGTTTGCGCATGCAGCTTCTT TAGCAGTAGCAGAAGCGCCGGCTAAAGCGTATAATCCGCTTTTTATTTACGGGGGAGTAGGATTAGGCAAAACACACTTA ATGCACGCCATAGGCCACTATGTGTTAGATCATAATCCTGCCGCGAAAGTCGTGTACTTATCATCTGAAAAATTCACAAA CGAGTTTATTAACTCTATTCGTGACAATAAAGCAGTAGAATTCCGCAACAAATACCGTAATGTAGATGTTTTACTGATTG ATGATATTCAATTCTTAGCAGGTAAAGAGCAGACACAAGAAGAATTTTTCCATACGTTTAATACGCTTCACGAAGAAAGC AAGCAGATTGTCATCTCAAGTGATCGACCGCCGAAAGAAATTCCTACACTTGAAGATCGACTTCGCTCTCGCTTTGAATG GGGCCTTATTACAGACATCACACCACCAGATTTGGAAACACGAATTGCTATTTTGCGTAAAAAAGCCAAAGCGGACGGCT TAGTTATTCCAAATGAAGTTATGCTTTATATCGCCAATCAGATTGATTCAAATATTAGAGAATTAGAAGGCGCACTTATT
  • 6. DNA  DNA is just a long string of 4 letters (nucleotides, or bases): Adenine, Guanine, Cytosine, and Thymine.  Which we will just refer to as A, C, G, and T  and we are skipping lots of details  Each DNA molecule has 2 strands, with the bases paired in the center  A on one strand always pairs with T on the other strand  G pairs with C.  the strands run in opposite directions (like roads)  Since the two DNA strands are complementary, there is no need to write down both strands
  • 7. Chromosomes and Genes  each chromosome is a long piece of DNA  B. megaterium genome is a circle (like most bacteria) of about 5 million bases.  Human chromosomes are 100-200 million bases long. We have 46 chromosomes (2 sets of 23, one set from each parent).  genes are just regions on that DNA. It is not obvious where genes are if you look at a DNA sequence.  there is a lot of DNA that is not part of genes: in humans only 2% at most of the DNA is part of any gene.  Bacteria use more of their DNA: 80% of the B. meg chromosome is genes.  B. meg has about 1 gene per 1000 base pairs (bp) of DNA. About 5000 genes  Humans have about 25,000 genes.  We are far more complicated than bacteria: regulation of the genes is very complicated in humans  We use the same gene in different ways in different tissues
  • 8. Genes and Proteins  Most genes code for proteins: each gene contains the information necessary to make one protein.  Proteins are the most important type of macromolecule.  Structure: collagen in skin, keratin in hair, crystallin in eye.  Enzymes: all metabolic transformations, building up, rearranging, and breaking down of organic compounds, are done by enzymes, which are proteins.  Transport: oxygen in the blood is carried by hemoglobin, everything that goes in or out of a cell (except water and a few gasses) is carried by proteins.  Also: nutrition (egg yolk), hormones, defense, movement
  • 9. The Genetic Code  Proteins are long chains of amino acids.  There are 20 different amino acids coded in DNA  There are only 4 DNA bases, so you need 3 DNA bases to code for the 20 amino acids  4 x 4 x 4 = 64 possible 3 base combinations (codons)  Each codon codes for one amino acid  Most amino acids have more than one possible codon  Genes start at a start codon and end at a stop codon.  3 codons are stop codons: all genes end at a stop codon.  Start codons are a bit trickier, since they are used in the middle of genes as well as at the beginning  in eukaryotes, ATG is always the start codon, making Methionine (Met) the first amino acid in all proteins (but in many proteins it is immediately removed).  In prokaryotes, ATG, GTG, or TTG can be used as a start codon. B. meg prefers ATG, but about 30% of the genes start with GTG or TTG. In bioinformatics, we generally ignore the fact that RNA uses the base uracil (U) in place of T.
  • 10. Gene Expression  How do you get a protein from a gene?  A two-step process (called the Central Dogma of Molecular Biology).  First, the gene has to be copied (transcribed) into an RNA form.  The RNA copy (messenger RNA) is exactly like the gene itself, except RNA replaces T with U.  Most gene regulation: whether the gene is “on” or “off” happens here  Second, the RNA is translated into protein by ribosomes, which are complex RNA/protein hybrid machines.  With the help of transfer RNA molecules, which have one end that matches the 3 base codon and the other end that is attached to the proper amino acid.  The ribosome starts at the start codon and moves down the messenger RNA, adding one amino acid at a time to the growing chain. When the ribosome reaches a stop codon, it falls off, releasing the new protein.
  • 11. Reading Frames  Here we get a bit subtle.  Since codons consist of 3 bases, there are 3 “reading frames” possible on an RNA (or DNA), depending on whether you start reading from the first base, the second base, or the third base.  The different reading frames give entirely different proteins.  Consider ATGCCATC, and refer to the genetic code. (X is junk)  Reading frame 1 divides this into ATG- CCA-TC, which translates to Met-Pro-X  Reading frame 2 divides this into A- TGC-CAT-C, which translates to X-Cys- His-X  Reading frame 3 divides this into AT- GCC-ATC, which translates to X-Ala-Ile  Each gene uses a single reading frame, so once the ribosome gets started, it just has to count off groups of 3 bases to produce the proper protein.
  • 12. Open Reading Frames  Ribosomes are very obedient to stop codons: when a stop codon is reached, the protein is finished. Thus, all genes end at the first stop codon in their reading frame.  Since 3 out of the 64 codons are stop codons, random DNA has stop codons very frequently.  However, genes do something necessary for survival, so natural selection keeps stop codons out of the middle of genes.  That is, if a mutation arises that creates a stop codon in the middle of a gene, the organism dies and leaves no descendants.  Open reading frames (ORFs) are regions with no stop codons. All genes reside in long open reading frames  Note that stop codons in other reading frames have no effect on the gene.  The start codon must occur “upstream” in the same reading frame as the stop codon. It is usually near the beginning of the ORF, but not necessarily the first possible start codon.  Determining the exact start codon is not easy or obvious.  But, the first stop codon in an open reading frame is always a reasonable guess This is a map of the stop codons in all 3 reading frames in a stretch of DNA. The long ORF in reading frame 1 is highlighted in black.
  • 13. Gene Placement  Genes can occur on either DNA strand.  If they are on the reverse strand, the DNA sequence needs to be reversed and complemented  In bacteria, most of the DNA is part of a gene. Most long open reading frames (say 100 bp or longer) that don’t overlap other long ORFs contain genes  Most genes do not overlap each other.  Sometimes there are very short overlaps (50 bp or less), especially if the two genes are functionally related.  In bacteria, genes that affect the same biochemical pathway or function are sometimes adjacent to each other on the same DNA strand (not necessarily the same reading frame), allowing them to be co-regulated  This group of genes is called an “operon”  Operons only exist in bacteria; they are not present in eukaryotes at all.
  • 14. Finding Genes  First job is to find long ORFs, examining the longest ORFs first and putting together a set with minimal overlaps.  It is also necessary to identify potential start codons, with the furthest upstream start codon as the easiest choice.  Then, how do we know that the ORF contains a real gene? The most definitive way is to match it with a gene known from other species  conservation of a sequence between species strongly suggests that the sequence has a function that is being conserved by natural selection  We compare protein sequences, not DNA, because protein is more conserved in evolution than DNA  The organism’s survival depends on the protein being functional, which means having the proper amino acids sequence  Since the genetic code is degenerate, many different DNA sequences will give identical proteins.  The protein 3-dimensional structure is even more conserved, because it is more closely related to enzyme activity than the amino acid sequence is.  However, we don’t have good ways of determining 3-D structure from a DNA sequence
  • 15. Sequence Comparison  So, we compare our ORF sequence to a database of known protein sequences from many species.  BLAST is the standard sequence alignment tool (BLAST = Basic Local Alignment Search Tool)  BLAST is based on the concept that if you compare the same (that is, homologous) protein from many different species, you can see that some amino acids readily substitute for each other and others almost never do.  A substitution matrix, giving a score for each amino acid position in the proteins being compared.
  • 16. Practical BLAST  BLAST itself is a bit of software that can be run on almost any computer, but the database needed for a good cross- species comparison is quite large  the database is called “nr” for “non-redundant”, and it contains at least 20 Gb of sequence data  We are going to use the BLAST service at UniProt, a European consortium that contains a comprehensive collection of protein sequences  http://www.uniprot.org/  Nearly all derived from DNA sequences: direct sequencing of proteins is difficult  Terminology: your sequence, which you paste into the box on the web site, is the query sequence. Sequences in the database that match yours are called subject sequences.
  • 17. A Sequence to BLAST  This is a more-or-less randomly chosen gene from B. meg.  It is 174 amino acids long  It is written in “fasta” format: the first line starts with > and is immediately followed by an identifier (ORF00135), and then some miscellaneous comments.  After that the sequence is written without spaces or other marks. >ORF00135 |chromosome 538197-538721 revcomp MKAKLIQYVYDAECRLFKSVN QHFDRKHLNRFLRLLTHAGGA TFTIVIACLLLFLYPSSVAYA CAFSLAVSHIPVAIAKKLYPR KRPYIQLKHTKVLENPLKDHS FPSGHTTAIFSLVTPLMIVYP AFAAVLLPLAVMVGISRIYLG LHYPTDVMVGLILGIFSGAVA LNIFLT
  • 19. BLAST Scores  Results are arranged with the best ones on top  The most important score is the Expect value, or E-value, which can be defined the number of hits any random sequence (with the same length as yours) would have in the database.  E-values for good hits are usually written something like: 3e-42, which is the same as 3 x 10-42 , a very small number  Bad hits are very common, and they have e-values in a more familiar form: for example, 0.004 or 1.2  A really good e-values is less than 1e-180, which underflows the computer’s processing capabilities, so it written as 0.0  E-values are affected by the length of the query sequence as well as the size of the database, so even perfect matches with short sequences give poor e-values  In this case we see many hits with good e-values, and the top e-values all are quite similar.  Before we can conclude that our protein is a homologue of the proteins BLAST matches it with, we would like them to have roughly the same length and have a high percentage of identical amino acids.  the lengths of the query and subject sequences should be within 20% of each other  There should be at least 30% identical amino acids  In this case we can be quite sure we have a good match  BLAST also returns a fourth value, the bit score, which we are going to ignore.
  • 20. Gene Names  Mostly genes are named with the function of their protein.  at some point, some related genes had their function determined through lab work: by examining the effects of mutations in the gene, by isolating and studying the protein produced by the gene, etc.  Enzymes (end in –ase), transport across the cell membrane, genetic information processing (DNA->RNA->protein), structural proteins, sporulation and germination, and more!  Many genes (maybe 1/4 of them in a typical genome) have no known function, although they are found in several different species: conserved hypothetical genes  Every new genome has some genes that are unique: no matching BLAST hits in the database.  Are they real genes? Sometimes there is evidence in the form of messenger RNA, but usually we don’t know  call them hypothetical genes  “putative” means that we think we know the gene’s function but we aren’t sure. Putative should be followed by the function name.
  • 21. More Gene Names  One question of interest: do the names of the top BLAST hits agree with each other? They should, but there are always annotation errors, and our knowledge of gene function increases over time.  With some sloppiness due to different naming conventions practiced by different scientists  Here we have a classic case of mis-naming. Why is the top hit ribosomal protein S2, with no other hit having this name?  Ribosomal proteins are highly conserved in evolution  Some checking on my part showed that no homology exists between this gene and the ribosomal protein S2 found in any other Bacillus species  The other names are similar, although not identical.  What is “PAP2”? A quick Google search shows that it stands for “phosphatidic acid phosphatase”, which fits the other names well.  There is probably some uncertainty about its exact function, given the variety of names and the “family protein” designation in several of them.
  • 22. Horizontal and Vertical Gene Transfer  We are accustomed to thinking of genes being passed from parent to offspring, always staying within the species, with very occasional splitting of one species into two.  This is called vertical gene transfer.  But, we know that some genes are transferred across species lines, not by the standard genetic mechanisms.  This is called horizontal gene transfer  It is rare in humans and other higher organisms  In bacteria 10% or more of genes have been transferred in horizontally.  B meg genes that come from vertical descent have other Bacillus species (or another closely related species) as the closest BLAST hit  Horizontally transferred genes can come from almost anywhere: other bacteria, Archaea, eukaryotes: plants, animals, fungi  The general mechanisms are well known, including conjugation (direct transfer of DNA between two bacteria), transduction (transfer of DNA using a virus as a carrier), and transformation (the bacteria pick up DNA molecules from their environment.
  • 23. Bacillus Phylogeny  “Kings Play Chess On Fine Ground Sand”  Bacteria is the domain  Firmicutes is the phylum  Bacilli is the class  Bacillales is the order  Bacillaceae is the family  Bacillus is the genus.
  • 24. Our Example  Most of the top hits are from various Bacillus species: there is little doubt that this gene is the results of normal, vertical gene flow.  What about “Anoxybacillus flavithermus”?  Click on the accession number to get more information, including its phylogeny.  Taxonomic lineage = Bacteria > Firmicutes > Bacillales > Bacillaceae > Anoxybacillus.  Same family as B meg.
  • 25. Aligned Sequences  You can see the aligned sequences by clicking on the “Local alignment” diagrams  Query sequence on top, subject below  Identical amino acids are in the middle of the alignment, and similar ones have a + sign.  Gaps: regions where one sequence has amino acids not found in the other sequence, are indicated with ---.  This protein is very typical in that the best matches are in the middle of the protein, with fewer identical amino acids near the ends.  Also, the match doesn’t quite make it to the very beginning of the proteins, although they are almost identical in length.  The active site of most enzymes is in the middle  The ends of proteins are often not well conserved
  • 27. Graphical Overview  Click on Graphical Overview (just under the BLAST box on the left) to get an overview of all the aligned sequences  The extent of the matching region is shown with the colored boxes, with non-matching regions drawn as a line.  Color indicates percent of identical amino acids  You can see that mostly our query and the various subjects (matches) line up along almost all of their lengths.  This is a good way to check whether our start site is reasonable.  A few odd ones lower down.  Genes, and pieces of genes, can move to new locations in the genome, fuse with other genes, break apart, etc. Always subject to natural selection: if the altered gene doesn’t work, the organism will die and we won’t see it.  And of course, sequencing and annotation errors occur.
  • 28. The Basic Points 1. DNA can be read in 3 different reading frames, a consequence of the genetic code (3 bases = 1 amino acid) 2. Genes are found in long open reading frames, areas where there are no stop codons. 3. BLAST is the tool we use to compare sequences between species • BLAST scores (e-values) describe the probability of finding a random sequence in the database 1. Gene sequences are conserved between species by natural selection • DNA sequences outside of genes are much less conserved 1. Most genes are transferred vertically, from parent to offspring, but a significant number are transferred horizontally, from unrelated species).
  • 29. Thank You  Email me : arputhaselvaraj@gmail.com