SlideShare a Scribd company logo
1 of 33
Download to read offline
www.   .uni-rostock.de




       Bioinformatics
     Introduction to genomics and proteomics I


                      Ulf Schmitz
         ulf.schmitz@informatik.uni-rostock.de

Bioinformatics and Systems Biology Group
            www.sbi.informatik.uni-rostock.de




      Ulf Schmitz, Introduction to genomics and proteomics I                      1
www.   .uni-rostock.de
Outline

Genomics/Genetics
   1. The tree of life
      •   Prokaryotic Genomes
          – Bacteria
          – Archaea
      •   Eukaryotic Genomes
          – Homo sapiens
   2. Genes
      •   Expression Data




               Ulf Schmitz, Introduction to genomics and proteomics I                     2
www.   .uni-rostock.de
Genomics - Definitions

  Genetics:   is the science of genes, heredity, and the variation of organisms.
                Humans began applying knowledge of genetics in prehistory with
                the domestication and breeding of plants and animals.
                In modern research, genetics provides tools in the investigation
                of the function of a particular gene, e.g. analysis of genetic
                interactions.


  Genomics:   attempts the study of large-scale genetic patterns across the
              genome for a given species. It deals with the systematic use of
              genome information to provide answers in biology, medicine, and
              industry.
                Genomics has the potential of offering new therapeutic methods
                for the treatment of some diseases, as well as new diagnostic
                methods.
                Major tools and methods related to genomics are bioinformatics,
                genetic analysis, measurement of gene expression, and
                determination of gene function.

                 Ulf Schmitz, Introduction to genomics and proteomics I                     3
Genes                                                                        www.   .uni-rostock.de



 •   a gene coding for a protein corresponds to a sequence of
     nucleotides along one or more regions of a molecule of DNA
 •   in species with double stranded DNA (dsDNA), genes may appear
     on either strand
 •   bacterial genes are continuous regions of DNA



 bacterium:
     • a string of 3N nucleotides encodes a string of N amino acids
     • or a string of N nucleotides encodes a structural RNA molecule of N
       residues
eukaryote:
   • a gene may appear split into separated segments in the DNA
   • an exon is a stretch of DNA retained in mRNA that the ribosomes translate
     into protein




                    Ulf Schmitz, Introduction to genomics and proteomics I                     4
www.    .uni-rostock.de
Genomics
   Genome size comparison
                       Species               Chrom.             Genes            Base pairs
           Human                                 46           28-35,000            3.1 billion
           (Homo sapiens)                      (23 pairs)

           Mouse                                 40          22.5-30,000           2.7 billion
           (Mus musculus)

           Puffer fish                           44             31,000            365 million
           (Fugu rubripes)

           Malaria mosquito                       6             14,000            289 million
           (Anopheles gambiae)

           Fruit Fly                              8             14,000            137 million
           (Drosophila melanogaster)

           Roundworm                             12             19,000             97 million
           (C. elegans)

           Bacterium                              1              5,000            4.1 million
           (E. coli)




                       Ulf Schmitz, Introduction to genomics and proteomics I                      5
www.   .uni-rostock.de
Genes
 exon:
     A section of DNA which carries the coding
     A section of DNA which carries the coding
     sequence for a protein or part of it. Exons
     sequence for a protein or part of it. Exons
     are separated by intervening, non-coding
     are separated by intervening, non-coding
     sequences (called introns). In eukaryotes
     sequences (called introns). In eukaryotes
     most genes consist of a number of exons.
     most genes consist of a number of exons.

intron:
   An intervening section of DNA which occurs
   An intervening section of DNA which occurs
   almost exclusively within a eukaryotic gene, but
   almost exclusively within a eukaryotic gene, but
   which is not translated to amino-acid sequences in
   which is not translated to amino-acid sequences in
   the gene product.
   the gene product.
   The introns are removed from the pre-mature
   The introns are removed from the pre-mature
   mRNA through a process called splicing, which
   mRNA through a process called splicing, which
   leaves the exons untouched, to form an active
   leaves the exons untouched, to form an active
   mRNA.
   mRNA.




                     Ulf Schmitz, Introduction to genomics and proteomics I                     6
www.   .uni-rostock.de
Genes
        Examples of the exon:intron mosaic of genes

                          exon               intron



           Globin gene – 1525 bp: 622 in exons, 893 in introns




           Ovalbumin gene - ~ 7500 bp: 8 short exons comprising 1859 bp




           Conalbumin gene - ~ 10,000 bp: 17 short exons comprising ~ 2,200 bp




                  Ulf Schmitz, Introduction to genomics and proteomics I                     7
www.   .uni-rostock.de
Picking out genes in genomes


 • Computer programs for genome analysis identify ORFs
   (open reading frames)

 • An ORF begins with an initiation codon ATG (AUG)

 • An ORF is a potential protein-coding region

 • There are two approaches to identify protein coding
   regions…




              Ulf Schmitz, Introduction to genomics and proteomics I                     8
www.   .uni-rostock.de
Picking out genes in genomes

1.   Detection of regions similar to known coding regions from other organisms

•    Regions may encode amino acid sequences similar to known proteins
•    Or may be similar to ESTs (correspond to genes known to be
     expressed)
•    Few hundred initial bases of cDNA are sequenced to identify a gene



2.   Ab initio methods, seek to identify genes from the properties of the
     DNA sequence itself
 •   Bacterial genes are easy to identify, because they are contiguous
 •   They have no introns and the space between genes is small
 •   Identification of exons in higher organisms is a problem, assembling
     them another…




                    Ulf Schmitz, Introduction to genomics and proteomics I                     9
www.   .uni-rostock.de
Picking out genes in genomes
Ab initio gene identification in eukaryotic genomes

    • The initial (5´) exon starts with a transcription start
      point, preceded by a core promoter site such as the
      TATA box (~30bp upstream)
             – Free of stop codons
             – End immediately before a GT splice-signal




binds and directs RNA polymerase
to the correct transcriptional start site

                                            Ulf Schmitz, Introduction to genomics and proteomics I                    10
www.   .uni-rostock.de
Picking out genes in genomes

5' splice signal




3' splice signal




                   Ulf Schmitz, Introduction to genomics and proteomics I                    11
www.   .uni-rostock.de
Picking out genes in genomes
Ab initio gene identification in eukaryotic genomes

• Internal exons are free of stop codons too
    – Begin after an AG splice signal
    – End before a GT splice signal




                    Ulf Schmitz, Introduction to genomics and proteomics I                    12
www.   .uni-rostock.de
Picking out genes in genomes
 Ab initio gene identification in eukaryotic genomes

• The final (3´) exon starts after a an AG splice signal
   – Ends with a stop codon (TAA,TAG,TGA)
   – Followed by a polyadenylation signal sequence




                   Ulf Schmitz, Introduction to genomics and proteomics I                    13
www.   .uni-rostock.de
Humans have
spliced genes…




                 Ulf Schmitz, Introduction to genomics and proteomics I                    14
www.   .uni-rostock.de
DNA makes RNA makes Protein




           Ulf Schmitz, Introduction to genomics and proteomics I                    15
www.   .uni-rostock.de
Tree of life
  Prokaryotes




                Ulf Schmitz, Introduction to genomics and proteomics I                    16
Genomics – Prokaryotes                                                             www.   .uni-rostock.de




•   the genome of a prokaryote comes
    as a single double-stranded DNA
    molecule in ring-form
     –   in average 2mm long
     –   whereas the cells diameter is only
         0.001mm
     –   < 5 Mb
•   prokaryotic cells can have plasmids
    as well (see next slide)
•   protein coding regions have no
    introns
•   little non-coding DNA compared to
    eukaryotes
     –   in E.coli only 11%




                              Ulf Schmitz, Introduction to genomics and proteomics I                    17
www.   .uni-rostock.de
Genomics - Plasmids
• Plasmids are circular double stranded DNA molecules that are separate
  from the chromosomal DNA.

• They usually occur in bacteria, sometimes in eukaryotic organisms

• Their size varies from 1 to 250 kilo base pairs (kbp). There are from one
  copy, for large plasmids, to hundreds of copies of the same plasmid
  present in a single cell.




                   Ulf Schmitz, Introduction to genomics and proteomics I                    18
www.   .uni-rostock.de
Prokaryotic model organisms

                          E.coli (Escherichia coli)




Methanococcus jannaschii (archaeon)




                                        Mycoplasma genitalium
                                        (simplest organism known)




                 Ulf Schmitz, Introduction to genomics and proteomics I                    19
www.   .uni-rostock.de
Genomics


 • DNA of higher organisms is organized into chromosomes
   (human – 23 chromosome pairs)

 • not all DNA codes for proteins

 • on the other hand some genes exist in multiple copies

 • that’s why from the genome size you can’t easily estimate
   the amount of protein sequence information




               Ulf Schmitz, Introduction to genomics and proteomics I                    20
www.   .uni-rostock.de
Genomes of eukaryotes


 • majority of the DNA is in the nucleus, separated into
   bundles (chromosomes)
    – small amounts of DNA appear in organelles (mitochondria and
      chloroplasts)
 • within single chromosomes gene families are common
    – some family members are paralogues (related)
       • they have duplicated within the same genome
       • often diverged to provide separate functions in descendants
         (Nachkommen)
       • e.g. human α and β globin
    – orthologues genes
       • are homologues in different species
       • often perform the same function
       • e.g. human and horse myoglobin
    – pseudogenes
       • lost their function
       • e.g. human globin gene cluster
                                                                   pseudogene
                Ulf Schmitz, Introduction to genomics and proteomics I                    21
www.   .uni-rostock.de
Eukaryotic model organisms


• Saccharomyces cerevisiae (baker’s yeast)

• Caenorhabditis elegans (C.elegans)

• Drosophila melanogaster (fruit fly)

• Arabidopsis thaliana (flower)

• Homo sapiens (human)




               Ulf Schmitz, Introduction to genomics and proteomics I                    22
www.   .uni-rostock.de
The human genome
 •   ~3.2 x 109 bp (thirty time larger than C.elegans or D.melongaster)
 •   coding sequences form only 5% of the human genome
 •   Repeat sequences over 50%
 •   Only ~32.000 genes
 •   Human genome is distributed over 22 chromosome pairs plus X and
     Y chromosomes
 •   Exons of protein-coding genes are relatively small compared to
     other known eukaryotic genomes
 •   Introns are relatively long
 •   Protein-coding genes span long stretches of DNA (dystrophin,
     coding a 3.685 amino acid protein, is >2.4Mbp long)

 •   Average gene length: ~ 8,000 bp
 •   Average of 5-6 exons/gene
 •   Average exon length: ~200 bp
 •   Average intron length: ~2,000 bp
 •   ~8% genes have a single exon
 •   Some exons can be as small as 1 or 3 bp.

                  Ulf Schmitz, Introduction to genomics and proteomics I                    23
www.      .uni-rostock.de
The human genome
Top categories in a function classification:
                    Function                Number    %                   Function                 Number    %
        Nucleic acid binding                 2207    14.0     Apoptosis inhibitor                    132      0.8
         DNA binding                         1656    10.5     Signal transduction                    1790    11.4
           DNA repair protein                  45     0.2       Receptor                             1318     8.4
           DNA replication factor               7     0.0       Transmembrane receptor               1202     7.6
           Transcription factor               986     6.2       G-protein link receptor               489     3.1
         RNA binding                          380     2.4       Olfactory receptor                     71     0.0
           Structural protein of ribosome     137     0.8
                                                              Storage protein                          7      0.0
           Translation factor                  44     0.2
                                                              Cell adhesion                          189      1.2
        Transcription factor binding            6     0.0
        Cell Cycle regulator                   75     0.4     Structural protein                     714      4.5
                                                                Cytoskeletal structural protein      145      0.9
        Chaperone                             154     0.9
                                                              Transporter                            682      4.3
        Motor                                  85     0.5       Ion channel                          269      1.7
        Actin binding                         129     0.8       Neurotransmitter transporter          19      0.1
        Defense/immunity protein              603     3.8     Ligand binding or carrier              1536     9.7
                                                                Electron transfer                      33     0.2
        Enzyme                               3242    20.6
                                                                  Cytochrome P450                      50     0.3
         Peptidase                            457     2.9
           Endopeptidase                      403     2.5     Tumor suppressor                         5      0.0
         Protein kinase                       839     5.3     Unclassified                                   30.6
                                                                                                     4813
         Protein phosphatase                  295     1.8
        Enzyme activator                        3     0.0     Total                                 15683   100.0

                               Ulf Schmitz, Introduction to genomics and proteomics I                               24
www.   .uni-rostock.de
The human genome
 •   Repeated sequences comprise over 50% of the genome:
      – Transposable elements, or interspersed repeats include LINEs and
        SINEs (almost 50%)
      – Retroposed pseudogenes
      – Simple ‘stutters’ - repeats of short oligomers (minisatellites and
        microsatellites)
      – Segment duplication, of blocks of ~10 - 300kb
      – Blocks of tandem repeats, including gene families


                                                              Copy          Fraction of
              Element                    Size (bp)
                                                             number         genome %

     Short Interspersed Nuclear      100-300                  1.500.000                13
     Elements (SINEs)
     Long Interspersed Nuclear       6000-8000                  850.000                21
     Elements (LINEs)
     Long Terminal Repeats           15.000 -110.000            450.000                8

     DNA Transposon fossils          80-3000                    300.000                3


                       Ulf Schmitz, Introduction to genomics and proteomics I                    25
www.   .uni-rostock.de
The human genome

• All people are different, but the DNA of different
  people only varies for 0.2% or less.
• So, only up to 2 letters in 1000 are expected to be
  different.
• Evidence in current genomics studies (Single
  Nucleotide Polymorphisms or SNPs) imply that on
  average only 1 letter out of 1400 is different
  between individuals.
• means that 2 to 3 million letters would differ
  between individuals.


             Ulf Schmitz, Introduction to genomics and proteomics I                    26
www.   .uni-rostock.de
Functional Genomics
From gene to function



                                                                                 Genome

                                                                                 Expressome



                                                                                 Proteome


                                            TERTIARY STRUCTURE (fold)
              TERTIARY STRUCTURE (fold)




                                                                                 Metabolome


                        Ulf Schmitz, Introduction to genomics and proteomics I                    27
www.   .uni-rostock.de
DNA makes RNA makes Protein:
Expression data


• More copies of mRNA for a gene leads to more
  protein
• mRNA can now be measured for all the genes in a
  cell at ones through microarray technology
• Can have 60,000 spots (genes) on a single gene
  chip
• Color change gives intensity of gene expression
  (over- or under-expression)




                  Ulf Schmitz, Introduction to genomics and proteomics I                    28
www.   .uni-rostock.de




Ulf Schmitz, Introduction to genomics and proteomics I                    29
www.   .uni-rostock.de
Genes and regulatory regions

regulatory mechanisms organize the
expression of genes
   – genes may be turned on or off in response to
     concentrations of nutrients or to stress
   – control regions often lie near the segments
     coding for proteins
   – they can serve as binding sites for molecules
     that transcribe the DNA
   – or they bind regulatory molecules that can
     block transcription


             Ulf Schmitz, Introduction to genomics and proteomics I                    30
www.   .uni-rostock.de
Expression data




            Ulf Schmitz, Introduction to genomics and proteomics I                    31
www.   .uni-rostock.de
Outlook – coming lecture


Proteomics
   – Proteins
   – post-translational modification
   – Key technologies
• Maps of hereditary information
• SNPs (Single nucleotide polymorphisms)
• Genetic diseases


             Ulf Schmitz, Introduction to genomics and proteomics I                    32
www.   .uni-rostock.de




Thanks for your
  attention!


 Ulf Schmitz, Introduction to genomics and proteomics I                    33

More Related Content

What's hot (20)

Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Protein database
Protein databaseProtein database
Protein database
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Distance based method
Distance based method Distance based method
Distance based method
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Genome
GenomeGenome
Genome
 
Prosite
PrositeProsite
Prosite
 
encode project
encode project encode project
encode project
 

Viewers also liked

Genomics seminar
Genomics seminarGenomics seminar
Genomics seminarS Rasouli
 
Allele mining, tilling and eco tilling
Allele mining, tilling and eco tillingAllele mining, tilling and eco tilling
Allele mining, tilling and eco tillingkundan Jadhao
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreemanshreeman cs
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)talhakhat
 

Viewers also liked (8)

Genomics seminar
Genomics seminarGenomics seminar
Genomics seminar
 
Allele mining, tilling and eco tilling
Allele mining, tilling and eco tillingAllele mining, tilling and eco tilling
Allele mining, tilling and eco tilling
 
Genomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug DiscoveryGenomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug Discovery
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Genomics
GenomicsGenomics
Genomics
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Allele mining
Allele miningAllele mining
Allele mining
 
Proteomics ppt
Proteomics pptProteomics ppt
Proteomics ppt
 

Similar to Genomics and proteomics I

proteomics and genomics-1
proteomics and genomics-1proteomics and genomics-1
proteomics and genomics-1Shyam Kodi
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics IINikolay Vyahhi
 
Genetics,study designs- Dr Harshavardhan Patwal
Genetics,study designs- Dr Harshavardhan PatwalGenetics,study designs- Dr Harshavardhan Patwal
Genetics,study designs- Dr Harshavardhan PatwalDr Harshavardhan Patwal
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) convertedGAnchal
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3stanislas547
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppttauseefsko
 
Gene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaGene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaAmit Rulhania
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
 
Genomicsandproteomicsii
GenomicsandproteomicsiiGenomicsandproteomicsii
GenomicsandproteomicsiiShyam Kodi
 
Prion Protein
Prion ProteinPrion Protein
Prion Proteinmazraara
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
trnspsns-170820132104.pdf
trnspsns-170820132104.pdftrnspsns-170820132104.pdf
trnspsns-170820132104.pdfAnukrittiMehra
 
Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Dobbs Ferry High School
 

Similar to Genomics and proteomics I (20)

Biotech 2011-01-intro
Biotech 2011-01-introBiotech 2011-01-intro
Biotech 2011-01-intro
 
proteomics and genomics-1
proteomics and genomics-1proteomics and genomics-1
proteomics and genomics-1
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
 
Genetics,study designs- Dr Harshavardhan Patwal
Genetics,study designs- Dr Harshavardhan PatwalGenetics,study designs- Dr Harshavardhan Patwal
Genetics,study designs- Dr Harshavardhan Patwal
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) converted
 
Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3Sk microfluidics and lab on-a-chip-ch3
Sk microfluidics and lab on-a-chip-ch3
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
transposons complete ppt
transposons complete ppttransposons complete ppt
transposons complete ppt
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Microbial genetics notes
Microbial genetics notesMicrobial genetics notes
Microbial genetics notes
 
Gene and Genome by Amit Rulhania
Gene and Genome by Amit RulhaniaGene and Genome by Amit Rulhania
Gene and Genome by Amit Rulhania
 
Genome Sequencing
Genome SequencingGenome Sequencing
Genome Sequencing
 
0.PDF
0.PDF0.PDF
0.PDF
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Genomicsandproteomicsii
GenomicsandproteomicsiiGenomicsandproteomicsii
Genomicsandproteomicsii
 
Prion Protein
Prion ProteinPrion Protein
Prion Protein
 
Transposons in bacteria
Transposons in bacteriaTransposons in bacteria
Transposons in bacteria
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
trnspsns-170820132104.pdf
trnspsns-170820132104.pdftrnspsns-170820132104.pdf
trnspsns-170820132104.pdf
 
Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016
 

More from Nikolay Vyahhi

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Molbiol 2011-13-organelles
Molbiol 2011-13-organellesMolbiol 2011-13-organelles
Molbiol 2011-13-organellesNikolay Vyahhi
 
Molbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionMolbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionNikolay Vyahhi
 
Molbiol 2011-10-proteins
Molbiol 2011-10-proteinsMolbiol 2011-10-proteins
Molbiol 2011-10-proteinsNikolay Vyahhi
 
Molbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationMolbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationNikolay Vyahhi
 
Molbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsMolbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsNikolay Vyahhi
 
Molbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleMolbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleNikolay Vyahhi
 
Molbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationMolbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationNikolay Vyahhi
 
Molbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinMolbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinNikolay Vyahhi
 
Molbiol 2011-04-metabolism
Molbiol 2011-04-metabolismMolbiol 2011-04-metabolism
Molbiol 2011-04-metabolismNikolay Vyahhi
 
Molbiol 2011-03-biochem
Molbiol 2011-03-biochemMolbiol 2011-03-biochem
Molbiol 2011-03-biochemNikolay Vyahhi
 
Molbiol 2011-02-biology
Molbiol 2011-02-biologyMolbiol 2011-02-biology
Molbiol 2011-02-biologyNikolay Vyahhi
 
Molbiol 2011-01-chemistry
Molbiol 2011-01-chemistryMolbiol 2011-01-chemistry
Molbiol 2011-01-chemistryNikolay Vyahhi
 
Molbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsMolbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsNikolay Vyahhi
 
Biotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaBiotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaNikolay Vyahhi
 
Biotech 2011-02-genetics
Biotech 2011-02-geneticsBiotech 2011-02-genetics
Biotech 2011-02-geneticsNikolay Vyahhi
 
Biotech 2011-10-methods
Biotech 2011-10-methodsBiotech 2011-10-methods
Biotech 2011-10-methodsNikolay Vyahhi
 
Biotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsBiotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsNikolay Vyahhi
 
Biotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcBiotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcNikolay Vyahhi
 

More from Nikolay Vyahhi (20)

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Molbiol 2011-wetlab
Molbiol 2011-wetlabMolbiol 2011-wetlab
Molbiol 2011-wetlab
 
Molbiol 2011-13-organelles
Molbiol 2011-13-organellesMolbiol 2011-13-organelles
Molbiol 2011-13-organelles
 
Molbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionMolbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expression
 
Molbiol 2011-10-proteins
Molbiol 2011-10-proteinsMolbiol 2011-10-proteins
Molbiol 2011-10-proteins
 
Molbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationMolbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombination
 
Molbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsMolbiol 2011-08-epigenetics
Molbiol 2011-08-epigenetics
 
Molbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleMolbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycle
 
Molbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationMolbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translation
 
Molbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinMolbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-protein
 
Molbiol 2011-04-metabolism
Molbiol 2011-04-metabolismMolbiol 2011-04-metabolism
Molbiol 2011-04-metabolism
 
Molbiol 2011-03-biochem
Molbiol 2011-03-biochemMolbiol 2011-03-biochem
Molbiol 2011-03-biochem
 
Molbiol 2011-02-biology
Molbiol 2011-02-biologyMolbiol 2011-02-biology
Molbiol 2011-02-biology
 
Molbiol 2011-01-chemistry
Molbiol 2011-01-chemistryMolbiol 2011-01-chemistry
Molbiol 2011-01-chemistry
 
Molbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsMolbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteins
 
Biotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaBiotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dna
 
Biotech 2011-02-genetics
Biotech 2011-02-geneticsBiotech 2011-02-genetics
Biotech 2011-02-genetics
 
Biotech 2011-10-methods
Biotech 2011-10-methodsBiotech 2011-10-methods
Biotech 2011-10-methods
 
Biotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsBiotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methods
 
Biotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcBiotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etc
 

Recently uploaded

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Recently uploaded (20)

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 

Genomics and proteomics I

  • 1. www. .uni-rostock.de Bioinformatics Introduction to genomics and proteomics I Ulf Schmitz ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de Ulf Schmitz, Introduction to genomics and proteomics I 1
  • 2. www. .uni-rostock.de Outline Genomics/Genetics 1. The tree of life • Prokaryotic Genomes – Bacteria – Archaea • Eukaryotic Genomes – Homo sapiens 2. Genes • Expression Data Ulf Schmitz, Introduction to genomics and proteomics I 2
  • 3. www. .uni-rostock.de Genomics - Definitions Genetics: is the science of genes, heredity, and the variation of organisms. Humans began applying knowledge of genetics in prehistory with the domestication and breeding of plants and animals. In modern research, genetics provides tools in the investigation of the function of a particular gene, e.g. analysis of genetic interactions. Genomics: attempts the study of large-scale genetic patterns across the genome for a given species. It deals with the systematic use of genome information to provide answers in biology, medicine, and industry. Genomics has the potential of offering new therapeutic methods for the treatment of some diseases, as well as new diagnostic methods. Major tools and methods related to genomics are bioinformatics, genetic analysis, measurement of gene expression, and determination of gene function. Ulf Schmitz, Introduction to genomics and proteomics I 3
  • 4. Genes www. .uni-rostock.de • a gene coding for a protein corresponds to a sequence of nucleotides along one or more regions of a molecule of DNA • in species with double stranded DNA (dsDNA), genes may appear on either strand • bacterial genes are continuous regions of DNA bacterium: • a string of 3N nucleotides encodes a string of N amino acids • or a string of N nucleotides encodes a structural RNA molecule of N residues eukaryote: • a gene may appear split into separated segments in the DNA • an exon is a stretch of DNA retained in mRNA that the ribosomes translate into protein Ulf Schmitz, Introduction to genomics and proteomics I 4
  • 5. www. .uni-rostock.de Genomics Genome size comparison Species Chrom. Genes Base pairs Human 46 28-35,000 3.1 billion (Homo sapiens) (23 pairs) Mouse 40 22.5-30,000 2.7 billion (Mus musculus) Puffer fish 44 31,000 365 million (Fugu rubripes) Malaria mosquito 6 14,000 289 million (Anopheles gambiae) Fruit Fly 8 14,000 137 million (Drosophila melanogaster) Roundworm 12 19,000 97 million (C. elegans) Bacterium 1 5,000 4.1 million (E. coli) Ulf Schmitz, Introduction to genomics and proteomics I 5
  • 6. www. .uni-rostock.de Genes exon: A section of DNA which carries the coding A section of DNA which carries the coding sequence for a protein or part of it. Exons sequence for a protein or part of it. Exons are separated by intervening, non-coding are separated by intervening, non-coding sequences (called introns). In eukaryotes sequences (called introns). In eukaryotes most genes consist of a number of exons. most genes consist of a number of exons. intron: An intervening section of DNA which occurs An intervening section of DNA which occurs almost exclusively within a eukaryotic gene, but almost exclusively within a eukaryotic gene, but which is not translated to amino-acid sequences in which is not translated to amino-acid sequences in the gene product. the gene product. The introns are removed from the pre-mature The introns are removed from the pre-mature mRNA through a process called splicing, which mRNA through a process called splicing, which leaves the exons untouched, to form an active leaves the exons untouched, to form an active mRNA. mRNA. Ulf Schmitz, Introduction to genomics and proteomics I 6
  • 7. www. .uni-rostock.de Genes Examples of the exon:intron mosaic of genes exon intron Globin gene – 1525 bp: 622 in exons, 893 in introns Ovalbumin gene - ~ 7500 bp: 8 short exons comprising 1859 bp Conalbumin gene - ~ 10,000 bp: 17 short exons comprising ~ 2,200 bp Ulf Schmitz, Introduction to genomics and proteomics I 7
  • 8. www. .uni-rostock.de Picking out genes in genomes • Computer programs for genome analysis identify ORFs (open reading frames) • An ORF begins with an initiation codon ATG (AUG) • An ORF is a potential protein-coding region • There are two approaches to identify protein coding regions… Ulf Schmitz, Introduction to genomics and proteomics I 8
  • 9. www. .uni-rostock.de Picking out genes in genomes 1. Detection of regions similar to known coding regions from other organisms • Regions may encode amino acid sequences similar to known proteins • Or may be similar to ESTs (correspond to genes known to be expressed) • Few hundred initial bases of cDNA are sequenced to identify a gene 2. Ab initio methods, seek to identify genes from the properties of the DNA sequence itself • Bacterial genes are easy to identify, because they are contiguous • They have no introns and the space between genes is small • Identification of exons in higher organisms is a problem, assembling them another… Ulf Schmitz, Introduction to genomics and proteomics I 9
  • 10. www. .uni-rostock.de Picking out genes in genomes Ab initio gene identification in eukaryotic genomes • The initial (5´) exon starts with a transcription start point, preceded by a core promoter site such as the TATA box (~30bp upstream) – Free of stop codons – End immediately before a GT splice-signal binds and directs RNA polymerase to the correct transcriptional start site Ulf Schmitz, Introduction to genomics and proteomics I 10
  • 11. www. .uni-rostock.de Picking out genes in genomes 5' splice signal 3' splice signal Ulf Schmitz, Introduction to genomics and proteomics I 11
  • 12. www. .uni-rostock.de Picking out genes in genomes Ab initio gene identification in eukaryotic genomes • Internal exons are free of stop codons too – Begin after an AG splice signal – End before a GT splice signal Ulf Schmitz, Introduction to genomics and proteomics I 12
  • 13. www. .uni-rostock.de Picking out genes in genomes Ab initio gene identification in eukaryotic genomes • The final (3´) exon starts after a an AG splice signal – Ends with a stop codon (TAA,TAG,TGA) – Followed by a polyadenylation signal sequence Ulf Schmitz, Introduction to genomics and proteomics I 13
  • 14. www. .uni-rostock.de Humans have spliced genes… Ulf Schmitz, Introduction to genomics and proteomics I 14
  • 15. www. .uni-rostock.de DNA makes RNA makes Protein Ulf Schmitz, Introduction to genomics and proteomics I 15
  • 16. www. .uni-rostock.de Tree of life Prokaryotes Ulf Schmitz, Introduction to genomics and proteomics I 16
  • 17. Genomics – Prokaryotes www. .uni-rostock.de • the genome of a prokaryote comes as a single double-stranded DNA molecule in ring-form – in average 2mm long – whereas the cells diameter is only 0.001mm – < 5 Mb • prokaryotic cells can have plasmids as well (see next slide) • protein coding regions have no introns • little non-coding DNA compared to eukaryotes – in E.coli only 11% Ulf Schmitz, Introduction to genomics and proteomics I 17
  • 18. www. .uni-rostock.de Genomics - Plasmids • Plasmids are circular double stranded DNA molecules that are separate from the chromosomal DNA. • They usually occur in bacteria, sometimes in eukaryotic organisms • Their size varies from 1 to 250 kilo base pairs (kbp). There are from one copy, for large plasmids, to hundreds of copies of the same plasmid present in a single cell. Ulf Schmitz, Introduction to genomics and proteomics I 18
  • 19. www. .uni-rostock.de Prokaryotic model organisms E.coli (Escherichia coli) Methanococcus jannaschii (archaeon) Mycoplasma genitalium (simplest organism known) Ulf Schmitz, Introduction to genomics and proteomics I 19
  • 20. www. .uni-rostock.de Genomics • DNA of higher organisms is organized into chromosomes (human – 23 chromosome pairs) • not all DNA codes for proteins • on the other hand some genes exist in multiple copies • that’s why from the genome size you can’t easily estimate the amount of protein sequence information Ulf Schmitz, Introduction to genomics and proteomics I 20
  • 21. www. .uni-rostock.de Genomes of eukaryotes • majority of the DNA is in the nucleus, separated into bundles (chromosomes) – small amounts of DNA appear in organelles (mitochondria and chloroplasts) • within single chromosomes gene families are common – some family members are paralogues (related) • they have duplicated within the same genome • often diverged to provide separate functions in descendants (Nachkommen) • e.g. human α and β globin – orthologues genes • are homologues in different species • often perform the same function • e.g. human and horse myoglobin – pseudogenes • lost their function • e.g. human globin gene cluster pseudogene Ulf Schmitz, Introduction to genomics and proteomics I 21
  • 22. www. .uni-rostock.de Eukaryotic model organisms • Saccharomyces cerevisiae (baker’s yeast) • Caenorhabditis elegans (C.elegans) • Drosophila melanogaster (fruit fly) • Arabidopsis thaliana (flower) • Homo sapiens (human) Ulf Schmitz, Introduction to genomics and proteomics I 22
  • 23. www. .uni-rostock.de The human genome • ~3.2 x 109 bp (thirty time larger than C.elegans or D.melongaster) • coding sequences form only 5% of the human genome • Repeat sequences over 50% • Only ~32.000 genes • Human genome is distributed over 22 chromosome pairs plus X and Y chromosomes • Exons of protein-coding genes are relatively small compared to other known eukaryotic genomes • Introns are relatively long • Protein-coding genes span long stretches of DNA (dystrophin, coding a 3.685 amino acid protein, is >2.4Mbp long) • Average gene length: ~ 8,000 bp • Average of 5-6 exons/gene • Average exon length: ~200 bp • Average intron length: ~2,000 bp • ~8% genes have a single exon • Some exons can be as small as 1 or 3 bp. Ulf Schmitz, Introduction to genomics and proteomics I 23
  • 24. www. .uni-rostock.de The human genome Top categories in a function classification: Function Number % Function Number % Nucleic acid binding 2207 14.0 Apoptosis inhibitor 132 0.8 DNA binding 1656 10.5 Signal transduction 1790 11.4 DNA repair protein 45 0.2 Receptor 1318 8.4 DNA replication factor 7 0.0 Transmembrane receptor 1202 7.6 Transcription factor 986 6.2 G-protein link receptor 489 3.1 RNA binding 380 2.4 Olfactory receptor 71 0.0 Structural protein of ribosome 137 0.8 Storage protein 7 0.0 Translation factor 44 0.2 Cell adhesion 189 1.2 Transcription factor binding 6 0.0 Cell Cycle regulator 75 0.4 Structural protein 714 4.5 Cytoskeletal structural protein 145 0.9 Chaperone 154 0.9 Transporter 682 4.3 Motor 85 0.5 Ion channel 269 1.7 Actin binding 129 0.8 Neurotransmitter transporter 19 0.1 Defense/immunity protein 603 3.8 Ligand binding or carrier 1536 9.7 Electron transfer 33 0.2 Enzyme 3242 20.6 Cytochrome P450 50 0.3 Peptidase 457 2.9 Endopeptidase 403 2.5 Tumor suppressor 5 0.0 Protein kinase 839 5.3 Unclassified 30.6 4813 Protein phosphatase 295 1.8 Enzyme activator 3 0.0 Total 15683 100.0 Ulf Schmitz, Introduction to genomics and proteomics I 24
  • 25. www. .uni-rostock.de The human genome • Repeated sequences comprise over 50% of the genome: – Transposable elements, or interspersed repeats include LINEs and SINEs (almost 50%) – Retroposed pseudogenes – Simple ‘stutters’ - repeats of short oligomers (minisatellites and microsatellites) – Segment duplication, of blocks of ~10 - 300kb – Blocks of tandem repeats, including gene families Copy Fraction of Element Size (bp) number genome % Short Interspersed Nuclear 100-300 1.500.000 13 Elements (SINEs) Long Interspersed Nuclear 6000-8000 850.000 21 Elements (LINEs) Long Terminal Repeats 15.000 -110.000 450.000 8 DNA Transposon fossils 80-3000 300.000 3 Ulf Schmitz, Introduction to genomics and proteomics I 25
  • 26. www. .uni-rostock.de The human genome • All people are different, but the DNA of different people only varies for 0.2% or less. • So, only up to 2 letters in 1000 are expected to be different. • Evidence in current genomics studies (Single Nucleotide Polymorphisms or SNPs) imply that on average only 1 letter out of 1400 is different between individuals. • means that 2 to 3 million letters would differ between individuals. Ulf Schmitz, Introduction to genomics and proteomics I 26
  • 27. www. .uni-rostock.de Functional Genomics From gene to function Genome Expressome Proteome TERTIARY STRUCTURE (fold) TERTIARY STRUCTURE (fold) Metabolome Ulf Schmitz, Introduction to genomics and proteomics I 27
  • 28. www. .uni-rostock.de DNA makes RNA makes Protein: Expression data • More copies of mRNA for a gene leads to more protein • mRNA can now be measured for all the genes in a cell at ones through microarray technology • Can have 60,000 spots (genes) on a single gene chip • Color change gives intensity of gene expression (over- or under-expression) Ulf Schmitz, Introduction to genomics and proteomics I 28
  • 29. www. .uni-rostock.de Ulf Schmitz, Introduction to genomics and proteomics I 29
  • 30. www. .uni-rostock.de Genes and regulatory regions regulatory mechanisms organize the expression of genes – genes may be turned on or off in response to concentrations of nutrients or to stress – control regions often lie near the segments coding for proteins – they can serve as binding sites for molecules that transcribe the DNA – or they bind regulatory molecules that can block transcription Ulf Schmitz, Introduction to genomics and proteomics I 30
  • 31. www. .uni-rostock.de Expression data Ulf Schmitz, Introduction to genomics and proteomics I 31
  • 32. www. .uni-rostock.de Outlook – coming lecture Proteomics – Proteins – post-translational modification – Key technologies • Maps of hereditary information • SNPs (Single nucleotide polymorphisms) • Genetic diseases Ulf Schmitz, Introduction to genomics and proteomics I 32
  • 33. www. .uni-rostock.de Thanks for your attention! Ulf Schmitz, Introduction to genomics and proteomics I 33