SlideShare uma empresa Scribd logo
1 de 156
Inleiding tot de bio-informatica en
computationele biologie
Lab for Bioinformatics and
          computational genomics
     10 “genome hackers”
   mostly engineers (statistics)




           42 scientists
 technicians, geneticists, clinicians




           >100 people
      hardware engineers,
mathematicians, molecular biologists
What is Bioinformatics ?

                 • Application of information technology to
                   the storage, management and analysis of
                   biological information (Facilitated by the
                   use of computers)
                     –     Sequence analysis?
                     –     Molecular modeling (HTX) ?
                     –     Phylogeny/evolution?
                     –     Ecology and population studies?
                     –     Medical informatics?
                     –     Image Analysis ?
                     –     Statistics ? AI ?
                     –     Sterkstroom of zwakstroom ?
Promises of genomics and bioinformatics

• Medicine (Pharma)
   – Genome analysis allows the targeting of genetic
     diseases
   – The effect of a disease or of a therapeutic on
     RNA and protein levels can be elucidated
   – Knowledge of protein structure facilitates drug
     design
   – Understanding of genomic variation allows the
     tailoring of medical treatment to the individual’s
     genetic make-up
• The same techniques can be applied to crop (Agro)
  and livestock improvement (Animal Health)
Bioinformatics, a life science discipline …


                                              Math




                                                     (Molecular)
     Informatics
                                                       Biology
Bioinformatics, a life science discipline …


                                              Math




      Computer Science                                Theoretical Biology




                                                            (Molecular)
     Informatics
                                                              Biology
                                  Computational Biology
Bioinformatics, a life science discipline …


                                              Math




      Computer Science                                  Theoretical Biology



                                       Bioinformatics



                                                              (Molecular)
     Informatics
                                                                Biology
                                  Computational Biology
Bioinformatics, a life science discipline … management of expectations


                                     Math




 Computer Science                                         Theoretical Biology
                  NP                            AI, Image Analysis
                  Datamining                    structure prediction (HTX)
                                Bioinformatics

       Interface Design           Expert Annotation
                   Sequence Analysis          (Molecular)
Informatics
                                                 Biology
                   Computational Biology
Bioinformatics, a life science discipline … management of expectations


                                     Math




 Computer Science                                         Theoretical Biology
                  NP                            AI, Image Analysis
                  Datamining                    structure prediction (HTX)
                       Bioinformatics
      Discovery Informatics – Computational Genomics
       Interface Design           Expert Annotation
                   Sequence Analysis          (Molecular)
Informatics
                                                 Biology
                   Computational Biology
Time (years)
• Timelin: Magaret
  Dayhoff …
Happy Birthday …
PCR + dye termination

  Suddenly, a flash of insight caused him to pull the
    car off the road and stop. He awakened his
    friend dozing in the passenger seat and
    excitedly explained to her that he had hit upon
    a solution - not to his original problem, but to
    one of even greater significance. Kary Mullis
    had just conceived of a simple method for
    producing virtually unlimited copies of a
    specific DNA sequence in a test tube - the
    polymerase chain reaction (PCR)
Setting the stage …



                      nature
                          the
                          Human
                          genome
Biological Research




                      Adapted from John McPherson, OICR
And this is just the beginning ….

Next Generation Sequencing is
             here
One additional insight ...
Read Length is Not As Important For Resequencing


                                                  100%
               % of Paired K-mers with Uniquely
                                                  90%
                                                  80%
                     Assignable Location


                                                  70%
                                                  60%
                                                                                         E.COLI
                                                  50%
                                                                                         HUMAN
                                                  40%
                                                  30%
                                                  20%
                                                  10%
                                                   0%
                                                         8   10   12   14 16   18   20
                                                         Length of K-mer Reads (bp)
Jay Shendure
ABI SOLID
Paired End Reads are Important!

                                Known Distance

                         Read 1          Read 2

               Repetitive DNA
                            Unique DNA

                                             Paired read maps uniquely




          Single read maps to
          multiple positions
Adapted from: Barak Cohen, Washington University, Bio5488   http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh




          Single Molecule Sequencing

Microscope slide
                                       *                        *                           *


Single DNA
molecule
                                                                            Super-cooled
 primer                                                                     TIRF microscope

 dNTP-Cy3 *

                                                                 Helicos Biosciences Corp.
Complete genomics
Next next generation sequencing
 Third generation sequencing
       Now sequencing
Pacific Biosciences: A Third Generation Sequencing Technology




                                                     Eid et al 2008
Nanopore Sequencing
Ultra-low-cost SINGLE molecule sequencing
Genome Size

              E. coli = 4.2 x 106
              Yeast = 18 x 106
              Arabidopsis = 80 x 106
              C.elegans = 100 x 106
              Drosophila = 180 x 106
              Human/Rat/Mouse = 3000 x 106
              Lily = 300 000 x 106

                                       With ... : 99.9 %
                                       To primates: 99%

               DOGS: Database Of Genome Sizes
Anno 2012
Anno 2012
Definitions
       Identity
       The extent to which two (nucleotide or amino acid)
       sequences are invariant.


       Homology
       Similarity attributed to descent from a common ancestor.

RBP:           26   RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84
                    + K ++ + + GTW++MA+      L   +   A   V T     +       +L+ W+
glycodelin:    23   QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
Definitions

Orthologous
Homologous sequences in different species
that arose from a common ancestral gene
during speciation; may or may not be responsible
for a similar function.

Paralogous
Homologous sequences within a single species
that arose by gene duplication.
speciation


duplication
Overview
           • Simple identity, which scores only identical amino
             acids as a match.
           • Genetic code changes, which scores the
             minimum number of nucieotide changes to change
             a codon for one amino acid into a codon for the
             other.
           • Chemical similarity of amino acid side chains,
             which scores as a match two amino acids which
             have a similar side chain, such as hydrophobic,
             charged and polar amino acid groups.
           • The Dayhoff percent accepted mutation (PAM)
             family of matrices, which scores amino acid pairs
             on the basis of the expected frequency of
             substitution of one amino acid for the other during
             protein evolution.
           • The blocks substitution matrix (BLOSUM) amino
             acid substitution tables, which scores amino acid
             pairs based on the frequency of amino acid
             substitutions in aligned sequence motifs called
             blocks which are found in protein families
BLOSUM (BLOck – SUM) scoring

 Block = ungapped alignent
 Eg. Amino Acids D N V A
 S = 3 sequences
 W = 6 aa
 N= (W*S*(S-1))/2 = 18 pairs

                               a b c d e f
                         1     DDNAAV
                         2     DNAVDD
                         3     NNVAVV
A. Observed pairs

         a b c d e f
    1   DDNAAV
    2   DNAVDD                           f fij    D   N    A V

    3   NNVAVV                             D
                                           N
                                                  1
                                                  4   1
                                           A      1   1    1
                                           V      3   1    4   1

 Relative frequency table                  gij    D    N   A V
                                           D     .056
 Probability of obtaining a pair   /18     N     .222 .056
 if randomly choosing pairs
                                           A     .056 .056 .056
 from block
                                           V     .167 .056 .222 .056
B. Expected pairs   A
                                                      Pi
                             DDDDD                  5/18
      DDNAAV
                             NNNN                   4/18
      DNAVDD
                             AAAA                   4/18
      NNVAVV
                             VVVVV                  5/18

   P{Draw DN pair}= P{Draw D, then N or Draw M, then D}
   P{Draw DN pair}= PDPN + PNPD = 2 * (5/18)*(4/18) = .123

    Random rel. frequency table          eij    D    N     A V
                                         D     .077
    Probability of obtaining a pair of   N     .123 .049
    each amino acid drawn                A     .154 .123 .049
    independently from block             V     .123 .099 .123 .049
C. Summary (A/B)



                             sij = log2 gij/eij

                   (sij) is basic BLOSUM score matrix


      Notes:
      • Observed pairs in blocks contain information about
      relationships at all levels of evolutionary distance
      simultaneously (Cf: Dayhoffs’s close relationships)
      • Actual algorithm generates observed + expected pair
      distributions by accumalution over a set of approx. 2000
      ungapped blocks of varrying with (w) + depth (s)
The BLOSUM Series

                    • blosum30,35,40,45,50,55,60,62,65,70,75,80,85,90
                    • transition frequencies observed directly by identifying
                      blocks that are at least
                        – 45% identical (BLOSUM-45)
                        – 50% identical (BLOSUM-50)
                        – 62% identical (BLOSUM-62) etc.
                    • No extrapolation made

                    • High blosum - closely related sequences
                    • Low blosum - distant sequences

                    • blosum45  pam250
                    • blosum62  pam160

                    • blosum62 is the most popular matrix
Overview
• Church of the Flying Spaghetti Monster




• http://www.venganza.org/about/open-letter
Overview
           – Henikoff and Henikoff have compared the
             BLOSUM matrices to PAM by evaluating how
             effectively the matrices can detect known members
             of a protein family from a database when searching
             with the ungapped local alignment program
             BLAST. They conclude that overall the BLOSUM
             62 matrix is the most effective.
               • However, all the substitution matrices investigated
                 perform better than BLOSUM 62 for a proportion of
                 the families. This suggests that no single matrix is
                 the complete answer for all sequence comparisons.
               • It is probably best to compliment the BLOSUM 62
                 matrix with comparisons using 250 PAMS, and
                 Overington structurally derived matrices.
           – It seems likely that as more protein three
             dimensional structures are determined, substitution
             tables derived from structure comparison will give
             the most reliable data.
Rat versus   Rat versus
mouse RBP    bacterial
             lipocalin
Alignments


             • Exhaustive …
               – All combinations:
             • Algorithm
               – Dynamic programming (much faster)
             • Heuristics
               – Needleman – Wunsh for global
                 alignments
                 (Journal of Molecular Biology, 1970)
               – Later adapated by Smith-Waterman
                 for local alignment
A metric …



             GACGGATTAG, GATCGGAATAG

                      GA-CGGATTAG
                      GATCGGAATAG

             +1 (a match), -1 (a mismatch),-2 (gap)

                    9*1 + 1*(-1)+1*(-2) = 6
Needleman-Wunsch-edu.pl

 The Score Matrix
 ----------------
         Seq1(j)1         2    3    4    5    6    7
 Seq2      *    C         K    H    V    F    C    R
 (i) *     0    -1        -2   -3   -4   -5   -6   -7
 1    C    -1   1         0    -1   -2   -3   -4   -5
 2    K    -2   0         2    1    0    -1   -2   -3
 3    K    -3   -1        1    1    0    -1   -2   -3
 4    C    -4   -2        0    0    0    -1   0    -1
 5    F    -5   -3        -1   -1   -1   1    0    -1
 6    C    -6   -4        -2   -2   -2   0    2    1
 7    K    -7   -5        -3   -3   -3   -1   1    1
 8    C    -8   -6        -4   -4   -4   -2   0    0
 9    V    -9   -7        -5   -5   -3   -3   -1   -1
Needleman-Wunsch-edu.pl

 The Score Matrix
 ----------------
         Seq1(j)1         2    3    4    5    6    7
 Seq2      *    C         K    H    V    F    C    R
 (i) *     0    -1        -2   -3   -4   -5   -6   -7
 1    C    -1   1         0    -1   -2   -3   -4   -5
 2    K    -2   0         2    1    0    -1   -2   -3
 3    K    -3   -1        1    1    0    -1   -2   -3
 4    C    -4   -2        0    0    0    -1   0    -1
 5    F    -5   -3        -1   -1   -1   1    0    -1
 6    C    -6   -4        -2   -2   -2   0    2    1
 7    K    -7   -5        -3   -3   -3   -1   1    1
 8    C    -8   -6        -4   -4   -4   -2   0    0
 9    V    -9   -7        -5   -5   -3   -3   -1   -1
Needleman-Wunsch-edu.pl

 The Score Matrix
 ----------------
         Seq1(j)1      2       3      4        5      6      7
 Seq2      *    C      K       H       V       F      C      R
 (i) *     0    -1     -2      -3      -4      -5     -6     -7
 1    C    -1   1 a 0          -1      -2      -3     -4     -5
 2    K    -2   0c     2b      1       0       -1     -2     -3
 3    K    -3   -1     1       1       0       -1     -2     -3
 4    C    -4   -2 matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH
                A:     0       0       0       -1     0      -1
 5    F    -5   -3     -1(substr(seq1,j-1,1) eq substr(seq2,i-1,1)
                        if     -1      -1      1      0      -1
 6    C    -6   -4 up_score = matrix(i-1,j) + GAP 2
                B:     -2      -2      -2      0             1
 7    K    -7   -5     -3      -3      -3      -1     1      1
 8    C    -8   -6 left_score =-4
                C:     -4       matrix(i,j-1) +-2
                                       -4       GAP 0        0
 9    V    -9   -7     -5      -5      -3      -3     -1     -1
Needleman-Wunsch-edu.pl

 The Score Matrix
 ----------------
         Seq1(j)1         2    3    4    5    6    7
 Seq2      *    C         K    H    V    F    C    R
 (i) *     0    -1        -2   -3   -4   -5   -6   -7
 1    C    -1   1         0    -1   -2   -3   -4   -5
 2    K    -2   0         2    1    0    -1   -2   -3
 3    K    -3   -1        1    1    0    -1   -2   -3
 4    C    -4   -2        0    0    0    -1   0    -1
 5    F    -5   -3        -1   -1   -1   1    0    -1
 6    C    -6   -4        -2   -2   -2   0    2    1
 7    K    -7   -5        -3   -3   -3   -1   1    1
 8    C    -8   -6        -4   -4   -4   -2   0    0
 9    V    -9   -7        -5   -5   -3   -3   -1   -1
Needleman-Wunsch-edu.pl
Needleman-Wunsch-edu.pl




         Seq1:CKHVFCRVCI
         Seq2:CKKCFC-KCV
              ++--++--+- score = 0
• Practicum: use similarity function in
  initialization step -> scoring tables

• Time Complexity

• Use random proteins to generate
  histogram of scores from aligned
  random sequences
Time complexity with needleman-wunsch.pl

              Sequence Length (aa)         Execution Time (s)
              10                           0
              25                           0
              50                           0
              100                          1
              500                          5
              1000                         19
              2500                         559
              5000                         Memory could not be
                                           written
Average around -64 !
                  -80
                  -78
                  -76
                  -74
                  -72   **
                  -70   *******
                  -68   ***************
                  -66   *************************
                  -64   ************************************************************
                  -60   ***********************
                  -58   ***************
                  -56   ********
                  -54   ****
                  -52   *
                  -50
                  -48
                  -46
                  -44
                  -42
                  -40
                  -38
If the sequences are similar, the path
of the best alignment should be very
close to the main diagonal.

Therefore, we may not need to fill the
entire matrix, rather, we fill a narrow
band of entries around the main
diagonal.

An algorithm that fills in a band of
width 2k+1 around the main
diagonal.
Multiple Alignment Method
Multiple Alignment Method
Examples


   Phylogenetic methods may be used to
   solve crimes, test purity of products, and
   determine whether endangered species
   have been smuggled or mislabeled:
   – Vogel, G. 1998. HIV strain analysis debuts in
     murder trial. Science 282(5390): 851-853.
   – Lau, D. T.-W., et al. 2001. Authentication of
     medicinal Dendrobium species by the internal
     transcribed spacer of ribosomal DNA. Planta
     Med 67:456-460.
Examples


  – Epidemiologists use phylogenetic methods to
    understand the development of
    pandemics, patterns of disease transmission, and
    development of antimicrobial resistance or
    pathogenicity:
     • Basler, C.F., et al. 2001. Sequence of the 1918
       pandemic influenza virus nonstructural gene (NS)
       segment and characterization of recombinant viruses
       bearing the 1918 NS genes. PNAS, 98(5):2746-2751.
     • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV
       transmission in a dental practice. Science
       256(5060):1165-1171.
     • Bacillus Antracis:
Tree Of Life
Modeling
Ramachandran / Phi-Psi Plot
Protein Architecture
Modeling

  • Finding a structural homologue
  • Blast
     –versus PDB database or PSI-
      blast (E<0.005)
     –Domain coverage at least 60%
  • Avoid Gaps
     –Choose for few gaps and
      reasonable similarity scores
      instead of lots of gaps and high
      similarity scores
Bootstrapping - an example

                        Ciliate SSUrDNA - parsimony bootstrap
                                                   Ochromonas (1)

                                                   Symbiodinium (2)
                                  100
                                                   Prorocentrum (3)

                                                   Euplotes (8)
                                        84
                                                   Tetrahymena (9)

                             96                    Loxodes (4)
                                             100
                                                   Tracheloraphis (5)
                                  100
                                                   Spirostomum (6)
                                             100
                                                   Gruberia (7)
                      Majority-rule consensus
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Personalized Medicine
• The use of diagnostic tests (aka biomarkers) to identify in advance
  which patients are likely to respond well to a therapy
• The benefits of this approach are to
   – avoid adverse drug reactions
   – improve efficacy
   – adjust the dose to suit the patient
   – differentiate a product in a competitive market
   – meet future legal or regulatory requirements
• Potential uses of biomarkers
   – Risk assessment
   – Initial/early detection
   – Prognosis
   – Prediction/therapy selection
   – Response assessment
   – Monitoring for recurrence
Biomarker

First used in 1971 … An objective and
  « predictive » measure … at the molecular
  level … of normal and pathogenic processes
  and responses to therapeutic interventions
Characteristic that is objectively measured and
  evaluated as an indicator of normal biologic
  or pathogenic processes or pharmacologic
  response to a drug
A biomarker is valid if:
   – It can be measured in a test system with well
     established performance characteristics
   – Evidence for its clinical significance has been
     established
Rationale 1:
Why now ? Regulatory path becoming more clear


                                                There is more at stake than
                                                  efficient drug
                                                  development. FDA
                                                  « critical path initiative »
                                                  Pharmacogenomics
                                                  guideline

                                                Biomarkers are the
                                                   foundation of « evidence
                                                   based medicine » - who
                                                   should be treated, how
                                                   and with what.

                                                Without Biomarkers
                                                   advances in targeted
                                                   therapy will be limited and
                                                   treatment remain largely
                                                   emperical. It is imperative
                                                   that Biomarker
                                                   development be
                                                   accelarated along with
                                                   therapeutics
Why now ?

First and maturing second generation molecular
  profiling methodologies allow to stratify clinical
  trial participants to include those most likely to
  benefit from the drug candidate—and exclude
  those who likely will not—pharmacogenomics-
  based
Clinical trials should attain more specific results
  with smaller numbers of patients. Smaller
  numbers mean fewer costs (factor 2-10)
An additional benefit for trial participants and
  internal review boards (IRBs) is that
  stratification, given the correct biomarker, may
  reduce or eliminate adverse events.
Molecular Profiling

The study of specific patterns (fingerprints) of proteins,
DNA, and/or mRNA and how these patterns correlate
with an individual's physical characteristics or
symptoms of disease.
Generic Health advice




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
EGFR based therapy in mCRC
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Before molecular profiling …
Before molecular profiling …
Before molecular profiling …
First Generation Molecular Profiling


• Flow cytometry correlates surface markers,
  cell size and other parameters
• Circulating tumor cell assays (CTC’s)
  quantitate the number of tumor cells in the
  peripheral blood.
• Exosomes are 30-90 nm vesicles secreted by
  a wide range of mammalian cell types.
• Immunohistochemistry (IHC) measures
  protein expression, usually on the cell
  surface.
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
Basics of the ―old‖ technology


• Clone the DNA.
• Generate a ladder of labeled (colored)
  molecules that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
Genetic Variation
 Among People
Single nucleotide polymorphisms
            (SNPs)
  GATTTAGATCGCGATAGAG
  GATTTAGATCTCGATAGAG



 0.1% difference among
         people
The genome fits as an e-mail attachment
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
mRNA Expression Microarray
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Second Generation DNA profiling


• Exome Sequencing (aka known as
  targeted exome capture) is an
  efficient strategy to selectively
  sequence the coding regions of the
  genome to identify novel genes
  associated with rare and common
  disorders.
• 160K exons
Second Generation DNA profiling
Second Generation DNA profiling
Second Generation RNA profiling

                      Besides the 6000 protein coding-genes …

                      140 ribosomal RNA genes
                      275 transfer RNA gnes
                      40 small nuclear RNA genes
                      >100 small nucleolar genes

                      Function of RNA genes

                      pRNA in 29 rotary packaging motor (Simpson
                      et el. Nature 408:745-750,2000)
                      Cartilage-hair hypoplasmia mapped to an RNA
 Contents-Schedule




                      (Ridanpoa et al. Cell 104:195-203,2001)
                      The human Prader-Willi ciritical region (Cavaille
                      et al. PNAS 97:14035-7, 2000)
Second Generation RNA profiling

                       RNA genes can be hard to detects

                       UGAGGUAGUAGGUUGUAUAGU

                       C.elegans let-27; 21 nt
                       (Pasquinelli et al. Nature 408:86-89,2000)


                       Often small
                       Sometimes multicopy and redundant
                       Often not polyadenylated
                       (not represented in ESTs)
                       Immune to frameshift and nonsense
                       mutations
                       No open reading frame, no codon bias
                       Often evolving rapidly in primary sequence
ncRNAs in human genome

 tRNA                    600   SRP RNA             1
 18S rRNA                200   RNase P RNA         1
 5.8S rRNA               200
                               Telomerase RNA      1
 28S rRNA                200
                               RNase MRP           1
 5S rRNA                 200
                               Y RNA               5
 snoRNA                  300
 miRNA                   250   Vault               4
 U1                       40   7SK RNA             1
 U2                       30   Xist                1
 U4                       30   H19                 1
 U5                       30   BIC                 1
 U6                       20
 U4atac                    5
                               Antisense RNAs 1000s?
 U6atac                    5
                               Cis reg regions   100s?
 U11                       5
 U12                       5   Others               ?
Mapping Structural Variation in Humans
           >1 kb segments
                   - Thought to be Common
                       12% of the genome
                       (Redon et al. 2006)
                   - Likely involved in phenotype
                        variation and disease
            CNVs
                   - Until recently most methods for
                      detection were low resolution
                      (>50 kb)
Size Distribution of CNV in a Human Genome
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Defining Epigenetics
                 Genome

                                             DNA           Reversible changes in gene
                                                            expression/function
                                                           Without changes in DNA
                                  Chromatin                 sequence
              Epigenome
                                                           Can be inherited from
                                                            precursor cells
       Gene Expression                                     Allows to integrate intrinsic
                                                            with environmental signals
 Phenotype
                                                            (including diet)

                                                                                        CONFIDENTIAL

             Methylation   I   Epigenetics         |        Oncology    |   Biomarker
                           I   NEXT-GEN            |       PharmacoDX   |     CRC
CONFIDENTIAL

Methylation   I   Epigenetics   |    Oncology    |   Biomarker
              I   NEXT-GEN      |   PharmacoDX   |     CRC
Epigenetic Regulation:
Post Translational Modifications to Histones and Base Changes in DNA

    Epigenetic modifications of histones and DNA include:
      – Histone acetylation and methylation, and DNA methylation

                                                 Histone
                                                 Methylation
                                                                        Me Me
 Histone
                                                      Me
 Acetylation
                          Ac


                                                                        DNA Methylation

                                                                                            CONFIDENTIAL

            Methylation        I   Epigenetics    |         Oncology      |     Biomarker
                               I   NEXT-GEN       |        PharmacoDX     |       CRC
MGMT Biology
O6 Methyl-Guanine
Methyl Transferase
Essential DNA Repair Enzyme

Removes alkyl groups from damaged guanine
bases

Healthy individual:
     - MGMT is an essential DNA repair enzyme
     Loss of MGMT activity makes individuals susceptible
     to DNA damage and prone to tumor development

Glioblastoma patient on alkylator chemotherapy:
     - Patients with MGMT promoter methylation show
     have longer PFS and OS with the use of alkylating
     agents as chemotherapy



                                                                                        CONFIDENTIAL

              Methylation     I     Epigenetics   |         Oncology    |   Biomarker
                              I     NEXT-GEN      |        PharmacoDX   |     CRC
MGMT Promoter
Methylation Predicts
Benefit form DNA-Alkylating Chemotherapy
  Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma
  patients show benefit for patients with MGMT promoter methylation

           Median Overall Survival
     25
                                      21.7 months
     20                                   plus
                                      temozolomide
     15
              12.7 months

     10                               radiotherapy

              radiotherapy
      5
                                                                      Adapted from Hegi et al.
                                                                      NEJM 2005
      0                                                               352(10):1036-8.
            Non-Methylated              Methylated                    Study with 207 patients
             MGMT Gene                  MGMT Gene
                                                                                                 CONFIDENTIAL

            Methylation         I    Epigenetics     |    Oncology    |      Biomarker
                                I    NEXT-GEN        |   PharmacoDX   |          CRC
Genome-wide methylation
by methylation sensitive restriction enzymes




                                                                            CONFIDENTIAL

           Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                         I   NEXT-GEN      |   PharmacoDX   |     CRC
Genome-wide methylation
by probes




                                                                          CONFIDENTIAL

         Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                       I   NEXT-GEN      |   PharmacoDX   |     CRC
Genome-wide methylation
…. by next generation sequencing

  # markers




                Discovery



                                Verification

                                                                 Validation



                                                                           # samples   CONFIDENTIAL

              Methylation   I     Epigenetics   |    Oncology      |    Biomarker
                            I     NEXT-GEN      |   PharmacoDX     |      CRC
MBD_Seq

Condensed Chromatin                        DNA Sheared



                                                                 Immobilized
                                                                 Methyl Binding Domain
           DNA Sheared




                                                                               CONFIDENTIAL

          Methylation    I   Epigenetics   |     Oncology    |     Biomarker
                         I   NEXT-GEN      |    PharmacoDX   |       CRC
MBD_Seq

                                                 Immobilized
                                                 Methyl binding domain




                                      MgCl2




                                                 Next Gen Sequencing
                                                 GA Illumina: 100 million reads

                                                                              CONFIDENTIAL

      Methylation   I   Epigenetics   |        Oncology     |    Biomarker
                    I   NEXT-GEN      |       PharmacoDX    |      CRC
MBD_Seq
MGMT = dual core




                                                                        CONFIDENTIAL

       Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                     I   NEXT-GEN      |   PharmacoDX   |     CRC
Genome-wide methylation
…. by next generation sequencing
  # markers


 1-2 million
                             MBD_Seq
 methylation
   cores
                 Discovery




                                                                         # samples   CONFIDENTIAL

               Methylation     I   Epigenetics   |    Oncology    |   Biomarker
                               I   NEXT-GEN      |   PharmacoDX   |     CRC
Data integration
Correlation tracks

expression                            expression



             Corr =-1                        Corr = 1




                        methylation                     methylation


                                                           CONFIDENTIAL


                                                                   142
Correlation track
in GBM @ MGMT




                                                                         +1




                                                                         -1
                                                                          CONFIDENTIAL

        Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                                                                                  143
                      I   NEXT-GEN      |   PharmacoDX   |
Genome-wide methylation
…. by next generation sequencing

  # markers


                            MBD_Seq


                Discovery

                                                  454_BT_Seq
                                  Verification                              MSP
                                                                     Validation



                                                                               # samples   CONFIDENTIAL

              Methylation     I     Epigenetics   |      Oncology      |    Biomarker
                              I     NEXT-GEN      |     PharmacoDX     |
Deep Sequencing


                  unmethylated alleles




                  methylated alleles     less methylation




                                         more methylation

                                                   CONFIDENTIAL
GCATCGTGACTTACGACTGATCGATGGATGCTA
Deep MGMT
Heterogenic complexity




                                                                        CONFIDENTIAL

       Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                     I   NEXT-GEN      |   PharmacoDX   |     CRC
CONFIDENTIAL

Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                                                                     147
              I   NEXT-GEN      |   PharmacoDX   |     CRC
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Translational Medicine: An inconvenient truth


  • 1% of genome codes for proteins, however
    more than 90% is transcribed
  • Less than 10% of protein experimentally
    measured can be ―explained‖ from the
    genome
  • 1 genome ? Structural variation
  • > 200 Epigenomes ??

  • Space/time continuum …
Translational Medicine: An inconvenient truth


  • 1% of genome codes for proteins, however
    more than 90% is transcribed
  • Less than 10% of protein experimentally
    measured can be ―explained‖ from the
    genome
  • 1 genome ? Structural variation
  • > 200 Epigenomes …

  • ―space/time‖ continuum
Cellular programming

               Epigenetic (meta)information = stem cells
Cellular reprogramming




Tumor

                         Tumor
                         Development
                         and
                         Growth


Epigenetically
altered, self-
renewing cancer
stem cells
Cellular reprogramming

              Gene-specific
              Epigenetic
              reprogramming
biobix
wvcrieki




biobix.be
bioinformatics.be

                156

Mais conteúdo relacionado

Mais procurados

Stratified Medicine - Applications and Case Studies
Stratified Medicine - Applications and Case StudiesStratified Medicine - Applications and Case Studies
Stratified Medicine - Applications and Case Studies
Space IDEAS Hub
 
Anovasia technology presentation nov2012 non-conf
Anovasia technology presentation nov2012 non-confAnovasia technology presentation nov2012 non-conf
Anovasia technology presentation nov2012 non-conf
John Dangerfield
 
My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
Robert Hoehndorf
 

Mais procurados (12)

ResearchTalks Vol. 2 - Lab on a chip ... or chip in the lab ?
ResearchTalks Vol. 2 - Lab on a chip ... or chip in the lab ?ResearchTalks Vol. 2 - Lab on a chip ... or chip in the lab ?
ResearchTalks Vol. 2 - Lab on a chip ... or chip in the lab ?
 
Cv barbara de_giorgio
Cv barbara de_giorgioCv barbara de_giorgio
Cv barbara de_giorgio
 
Summers imeko 2011
Summers imeko 2011Summers imeko 2011
Summers imeko 2011
 
Fluorescence Antibody And Protein Imaging Movie
Fluorescence Antibody And Protein Imaging MovieFluorescence Antibody And Protein Imaging Movie
Fluorescence Antibody And Protein Imaging Movie
 
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
 
Stratified Medicine - Applications and Case Studies
Stratified Medicine - Applications and Case StudiesStratified Medicine - Applications and Case Studies
Stratified Medicine - Applications and Case Studies
 
1nanomedicine
1nanomedicine1nanomedicine
1nanomedicine
 
Anovasia technology presentation nov2012 non-conf
Anovasia technology presentation nov2012 non-confAnovasia technology presentation nov2012 non-conf
Anovasia technology presentation nov2012 non-conf
 
Chapt 09
Chapt 09Chapt 09
Chapt 09
 
My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
 
Executable Biology Tutorial
Executable Biology TutorialExecutable Biology Tutorial
Executable Biology Tutorial
 
Bioteknologi 8-basic techniques in biotechnology
Bioteknologi 8-basic techniques in biotechnologyBioteknologi 8-basic techniques in biotechnology
Bioteknologi 8-basic techniques in biotechnology
 

Destaque

Destaque (20)

Mini symposium
Mini symposiumMini symposium
Mini symposium
 
Bioinformatica p6-bioperl
Bioinformatica p6-bioperlBioinformatica p6-bioperl
Bioinformatica p6-bioperl
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
2014 05 21_personal_genomics_v_n2n_vfinal
2014 05 21_personal_genomics_v_n2n_vfinal2014 05 21_personal_genomics_v_n2n_vfinal
2014 05 21_personal_genomics_v_n2n_vfinal
 
December 2012 drylab
December 2012 drylabDecember 2012 drylab
December 2012 drylab
 
2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
Bioinformatica p1-perl-introduction
Bioinformatica p1-perl-introductionBioinformatica p1-perl-introduction
Bioinformatica p1-perl-introduction
 
2015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part42015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part4
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge
 
2012 12 12_adam_v_final
2012 12 12_adam_v_final2012 12 12_adam_v_final
2012 12 12_adam_v_final
 
NXTGNT kick off
NXTGNT kick offNXTGNT kick off
NXTGNT kick off
 
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Bioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introductionBioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introduction
 
Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014
 

Semelhante a Bioinformatics life sciences_2012

Hydrahack 1.5: Bioimaging intro (higher contrast)
Hydrahack 1.5: Bioimaging intro (higher contrast)Hydrahack 1.5: Bioimaging intro (higher contrast)
Hydrahack 1.5: Bioimaging intro (higher contrast)
manicstreetpreacher
 
Hydrahack 1.5: Bioimaging Introduction
Hydrahack 1.5: Bioimaging IntroductionHydrahack 1.5: Bioimaging Introduction
Hydrahack 1.5: Bioimaging Introduction
manicstreetpreacher
 
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
piero scaruffi
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
nadeem akhter
 
Introduction to Nanotechnology: Part 4
Introduction to Nanotechnology:  Part 4Introduction to Nanotechnology:  Part 4
Introduction to Nanotechnology: Part 4
glennfish
 

Semelhante a Bioinformatics life sciences_2012 (20)

Bioinformatica t1-bioinformatics
Bioinformatica t1-bioinformaticsBioinformatica t1-bioinformatics
Bioinformatica t1-bioinformatics
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
Hydrahack 1.5: Bioimaging intro (higher contrast)
Hydrahack 1.5: Bioimaging intro (higher contrast)Hydrahack 1.5: Bioimaging intro (higher contrast)
Hydrahack 1.5: Bioimaging intro (higher contrast)
 
Hydrahack 1.5: Bioimaging Introduction
Hydrahack 1.5: Bioimaging IntroductionHydrahack 1.5: Bioimaging Introduction
Hydrahack 1.5: Bioimaging Introduction
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological Research
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
Christina Smolke (Stanford) at a LASER on "Synthetic Biology"
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011
 
Bioinformatics relevance with biotechnology
Bioinformatics relevance with biotechnologyBioinformatics relevance with biotechnology
Bioinformatics relevance with biotechnology
 
What_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdfWhat_is_Bioinformatics_Dr_Sudha.pdf
What_is_Bioinformatics_Dr_Sudha.pdf
 
Introduction to Nanotechnology: Part 4
Introduction to Nanotechnology:  Part 4Introduction to Nanotechnology:  Part 4
Introduction to Nanotechnology: Part 4
 

Mais de Prof. Wim Van Criekinge

Mais de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotech
 

Último

Último (20)

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Bioinformatics life sciences_2012

  • 1.
  • 2. Inleiding tot de bio-informatica en computationele biologie
  • 3. Lab for Bioinformatics and computational genomics 10 “genome hackers” mostly engineers (statistics) 42 scientists technicians, geneticists, clinicians >100 people hardware engineers, mathematicians, molecular biologists
  • 4.
  • 5. What is Bioinformatics ? • Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers) – Sequence analysis? – Molecular modeling (HTX) ? – Phylogeny/evolution? – Ecology and population studies? – Medical informatics? – Image Analysis ? – Statistics ? AI ? – Sterkstroom of zwakstroom ?
  • 6. Promises of genomics and bioinformatics • Medicine (Pharma) – Genome analysis allows the targeting of genetic diseases – The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated – Knowledge of protein structure facilitates drug design – Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health)
  • 7. Bioinformatics, a life science discipline … Math (Molecular) Informatics Biology
  • 8. Bioinformatics, a life science discipline … Math Computer Science Theoretical Biology (Molecular) Informatics Biology Computational Biology
  • 9. Bioinformatics, a life science discipline … Math Computer Science Theoretical Biology Bioinformatics (Molecular) Informatics Biology Computational Biology
  • 10. Bioinformatics, a life science discipline … management of expectations Math Computer Science Theoretical Biology NP AI, Image Analysis Datamining structure prediction (HTX) Bioinformatics Interface Design Expert Annotation Sequence Analysis (Molecular) Informatics Biology Computational Biology
  • 11. Bioinformatics, a life science discipline … management of expectations Math Computer Science Theoretical Biology NP AI, Image Analysis Datamining structure prediction (HTX) Bioinformatics Discovery Informatics – Computational Genomics Interface Design Expert Annotation Sequence Analysis (Molecular) Informatics Biology Computational Biology
  • 13.
  • 14. • Timelin: Magaret Dayhoff …
  • 16. PCR + dye termination Suddenly, a flash of insight caused him to pull the car off the road and stop. He awakened his friend dozing in the passenger seat and excitedly explained to her that he had hit upon a solution - not to his original problem, but to one of even greater significance. Kary Mullis had just conceived of a simple method for producing virtually unlimited copies of a specific DNA sequence in a test tube - the polymerase chain reaction (PCR)
  • 17. Setting the stage … nature the Human genome
  • 18. Biological Research Adapted from John McPherson, OICR
  • 19. And this is just the beginning …. Next Generation Sequencing is here
  • 20.
  • 21.
  • 23. Read Length is Not As Important For Resequencing 100% % of Paired K-mers with Uniquely 90% 80% Assignable Location 70% 60% E.COLI 50% HUMAN 40% 30% 20% 10% 0% 8 10 12 14 16 18 20 Length of K-mer Reads (bp) Jay Shendure
  • 24.
  • 26. Paired End Reads are Important! Known Distance Read 1 Read 2 Repetitive DNA Unique DNA Paired read maps uniquely Single read maps to multiple positions
  • 27. Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh Single Molecule Sequencing Microscope slide * * * Single DNA molecule Super-cooled primer TIRF microscope dNTP-Cy3 * Helicos Biosciences Corp.
  • 29. Next next generation sequencing Third generation sequencing Now sequencing
  • 30. Pacific Biosciences: A Third Generation Sequencing Technology Eid et al 2008
  • 33. Genome Size E. coli = 4.2 x 106 Yeast = 18 x 106 Arabidopsis = 80 x 106 C.elegans = 100 x 106 Drosophila = 180 x 106 Human/Rat/Mouse = 3000 x 106 Lily = 300 000 x 106 With ... : 99.9 % To primates: 99% DOGS: Database Of Genome Sizes
  • 34.
  • 37. Definitions Identity The extent to which two (nucleotide or amino acid) sequences are invariant. Homology Similarity attributed to descent from a common ancestor. RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84 + K ++ + + GTW++MA+ L + A V T + +L+ W+ glycodelin: 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
  • 38. Definitions Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. Paralogous Homologous sequences within a single species that arose by gene duplication.
  • 40. Overview • Simple identity, which scores only identical amino acids as a match. • Genetic code changes, which scores the minimum number of nucieotide changes to change a codon for one amino acid into a codon for the other. • Chemical similarity of amino acid side chains, which scores as a match two amino acids which have a similar side chain, such as hydrophobic, charged and polar amino acid groups. • The Dayhoff percent accepted mutation (PAM) family of matrices, which scores amino acid pairs on the basis of the expected frequency of substitution of one amino acid for the other during protein evolution. • The blocks substitution matrix (BLOSUM) amino acid substitution tables, which scores amino acid pairs based on the frequency of amino acid substitutions in aligned sequence motifs called blocks which are found in protein families
  • 41. BLOSUM (BLOck – SUM) scoring Block = ungapped alignent Eg. Amino Acids D N V A S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs a b c d e f 1 DDNAAV 2 DNAVDD 3 NNVAVV
  • 42. A. Observed pairs a b c d e f 1 DDNAAV 2 DNAVDD f fij D N A V 3 NNVAVV D N 1 4 1 A 1 1 1 V 3 1 4 1 Relative frequency table gij D N A V D .056 Probability of obtaining a pair /18 N .222 .056 if randomly choosing pairs A .056 .056 .056 from block V .167 .056 .222 .056
  • 43. B. Expected pairs A Pi DDDDD 5/18 DDNAAV NNNN 4/18 DNAVDD AAAA 4/18 NNVAVV VVVVV 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= PDPN + PNPD = 2 * (5/18)*(4/18) = .123 Random rel. frequency table eij D N A V D .077 Probability of obtaining a pair of N .123 .049 each amino acid drawn A .154 .123 .049 independently from block V .123 .099 .123 .049
  • 44. C. Summary (A/B) sij = log2 gij/eij (sij) is basic BLOSUM score matrix Notes: • Observed pairs in blocks contain information about relationships at all levels of evolutionary distance simultaneously (Cf: Dayhoffs’s close relationships) • Actual algorithm generates observed + expected pair distributions by accumalution over a set of approx. 2000 ungapped blocks of varrying with (w) + depth (s)
  • 45. The BLOSUM Series • blosum30,35,40,45,50,55,60,62,65,70,75,80,85,90 • transition frequencies observed directly by identifying blocks that are at least – 45% identical (BLOSUM-45) – 50% identical (BLOSUM-50) – 62% identical (BLOSUM-62) etc. • No extrapolation made • High blosum - closely related sequences • Low blosum - distant sequences • blosum45  pam250 • blosum62  pam160 • blosum62 is the most popular matrix
  • 47. • Church of the Flying Spaghetti Monster • http://www.venganza.org/about/open-letter
  • 48. Overview – Henikoff and Henikoff have compared the BLOSUM matrices to PAM by evaluating how effectively the matrices can detect known members of a protein family from a database when searching with the ungapped local alignment program BLAST. They conclude that overall the BLOSUM 62 matrix is the most effective. • However, all the substitution matrices investigated perform better than BLOSUM 62 for a proportion of the families. This suggests that no single matrix is the complete answer for all sequence comparisons. • It is probably best to compliment the BLOSUM 62 matrix with comparisons using 250 PAMS, and Overington structurally derived matrices. – It seems likely that as more protein three dimensional structures are determined, substitution tables derived from structure comparison will give the most reliable data.
  • 49. Rat versus Rat versus mouse RBP bacterial lipocalin
  • 50. Alignments • Exhaustive … – All combinations: • Algorithm – Dynamic programming (much faster) • Heuristics – Needleman – Wunsh for global alignments (Journal of Molecular Biology, 1970) – Later adapated by Smith-Waterman for local alignment
  • 51. A metric … GACGGATTAG, GATCGGAATAG GA-CGGATTAG GATCGGAATAG +1 (a match), -1 (a mismatch),-2 (gap) 9*1 + 1*(-1)+1*(-2) = 6
  • 52. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 53. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 54. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 a 0 -1 -2 -3 -4 -5 2 K -2 0c 2b 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH A: 0 0 0 -1 0 -1 5 F -5 -3 -1(substr(seq1,j-1,1) eq substr(seq2,i-1,1) if -1 -1 1 0 -1 6 C -6 -4 up_score = matrix(i-1,j) + GAP 2 B: -2 -2 -2 0 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 left_score =-4 C: -4 matrix(i,j-1) +-2 -4 GAP 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 55. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 57. Needleman-Wunsch-edu.pl Seq1:CKHVFCRVCI Seq2:CKKCFC-KCV ++--++--+- score = 0
  • 58. • Practicum: use similarity function in initialization step -> scoring tables • Time Complexity • Use random proteins to generate histogram of scores from aligned random sequences
  • 59. Time complexity with needleman-wunsch.pl Sequence Length (aa) Execution Time (s) 10 0 25 0 50 0 100 1 500 5 1000 19 2500 559 5000 Memory could not be written
  • 60. Average around -64 ! -80 -78 -76 -74 -72 ** -70 ******* -68 *************** -66 ************************* -64 ************************************************************ -60 *********************** -58 *************** -56 ******** -54 **** -52 * -50 -48 -46 -44 -42 -40 -38
  • 61. If the sequences are similar, the path of the best alignment should be very close to the main diagonal. Therefore, we may not need to fill the entire matrix, rather, we fill a narrow band of entries around the main diagonal. An algorithm that fills in a band of width 2k+1 around the main diagonal.
  • 64. Examples Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998. HIV strain analysis debuts in murder trial. Science 282(5390): 851-853. – Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460.
  • 65.
  • 66. Examples – Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751. • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171. • Bacillus Antracis:
  • 68.
  • 69.
  • 70.
  • 74. Modeling • Finding a structural homologue • Blast –versus PDB database or PSI- blast (E<0.005) –Domain coverage at least 60% • Avoid Gaps –Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores
  • 75. Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Ochromonas (1) Symbiodinium (2) 100 Prorocentrum (3) Euplotes (8) 84 Tetrahymena (9) 96 Loxodes (4) 100 Tracheloraphis (5) 100 Spirostomum (6) 100 Gruberia (7) Majority-rule consensus
  • 76.
  • 77.
  • 78.
  • 79. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 80. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86. Personalized Medicine • The use of diagnostic tests (aka biomarkers) to identify in advance which patients are likely to respond well to a therapy • The benefits of this approach are to – avoid adverse drug reactions – improve efficacy – adjust the dose to suit the patient – differentiate a product in a competitive market – meet future legal or regulatory requirements • Potential uses of biomarkers – Risk assessment – Initial/early detection – Prognosis – Prediction/therapy selection – Response assessment – Monitoring for recurrence
  • 87. Biomarker First used in 1971 … An objective and « predictive » measure … at the molecular level … of normal and pathogenic processes and responses to therapeutic interventions Characteristic that is objectively measured and evaluated as an indicator of normal biologic or pathogenic processes or pharmacologic response to a drug A biomarker is valid if: – It can be measured in a test system with well established performance characteristics – Evidence for its clinical significance has been established
  • 88. Rationale 1: Why now ? Regulatory path becoming more clear There is more at stake than efficient drug development. FDA « critical path initiative » Pharmacogenomics guideline Biomarkers are the foundation of « evidence based medicine » - who should be treated, how and with what. Without Biomarkers advances in targeted therapy will be limited and treatment remain largely emperical. It is imperative that Biomarker development be accelarated along with therapeutics
  • 89. Why now ? First and maturing second generation molecular profiling methodologies allow to stratify clinical trial participants to include those most likely to benefit from the drug candidate—and exclude those who likely will not—pharmacogenomics- based Clinical trials should attain more specific results with smaller numbers of patients. Smaller numbers mean fewer costs (factor 2-10) An additional benefit for trial participants and internal review boards (IRBs) is that stratification, given the correct biomarker, may reduce or eliminate adverse events.
  • 90. Molecular Profiling The study of specific patterns (fingerprints) of proteins, DNA, and/or mRNA and how these patterns correlate with an individual's physical characteristics or symptoms of disease.
  • 91. Generic Health advice • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 92. Generic Health advice (UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 93. Generic Health advice (UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 94. Generic Health advice (UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 96. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 98.
  • 99.
  • 100.
  • 103.
  • 104. First Generation Molecular Profiling • Flow cytometry correlates surface markers, cell size and other parameters • Circulating tumor cell assays (CTC’s) quantitate the number of tumor cells in the peripheral blood. • Exosomes are 30-90 nm vesicles secreted by a wide range of mammalian cell types. • Immunohistochemistry (IHC) measures protein expression, usually on the cell surface.
  • 105.
  • 106.
  • 107.
  • 108. First Generation Molecular Profiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 109. Basics of the ―old‖ technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a genome.
  • 110.
  • 111. Genetic Variation Among People Single nucleotide polymorphisms (SNPs) GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG 0.1% difference among people
  • 112. The genome fits as an e-mail attachment
  • 113. First Generation Molecular Profiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 115. First Generation Molecular Profiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 116.
  • 117. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 118. Second Generation DNA profiling • Exome Sequencing (aka known as targeted exome capture) is an efficient strategy to selectively sequence the coding regions of the genome to identify novel genes associated with rare and common disorders. • 160K exons
  • 119. Second Generation DNA profiling
  • 120. Second Generation DNA profiling
  • 121. Second Generation RNA profiling Besides the 6000 protein coding-genes … 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes Function of RNA genes pRNA in 29 rotary packaging motor (Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA Contents-Schedule (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
  • 122. Second Generation RNA profiling RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence
  • 123. ncRNAs in human genome tRNA 600 SRP RNA 1 18S rRNA 200 RNase P RNA 1 5.8S rRNA 200 Telomerase RNA 1 28S rRNA 200 RNase MRP 1 5S rRNA 200 Y RNA 5 snoRNA 300 miRNA 250 Vault 4 U1 40 7SK RNA 1 U2 30 Xist 1 U4 30 H19 1 U5 30 BIC 1 U6 20 U4atac 5 Antisense RNAs 1000s? U6atac 5 Cis reg regions 100s? U11 5 U12 5 Others ?
  • 124.
  • 125. Mapping Structural Variation in Humans >1 kb segments - Thought to be Common 12% of the genome (Redon et al. 2006) - Likely involved in phenotype variation and disease CNVs - Until recently most methods for detection were low resolution (>50 kb)
  • 126. Size Distribution of CNV in a Human Genome
  • 127.
  • 128. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 129. Defining Epigenetics Genome DNA  Reversible changes in gene expression/function  Without changes in DNA Chromatin sequence Epigenome  Can be inherited from precursor cells Gene Expression  Allows to integrate intrinsic with environmental signals Phenotype (including diet) CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 130. CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 131. Epigenetic Regulation: Post Translational Modifications to Histones and Base Changes in DNA  Epigenetic modifications of histones and DNA include: – Histone acetylation and methylation, and DNA methylation Histone Methylation Me Me Histone Me Acetylation Ac DNA Methylation CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 132.
  • 133. MGMT Biology O6 Methyl-Guanine Methyl Transferase Essential DNA Repair Enzyme Removes alkyl groups from damaged guanine bases Healthy individual: - MGMT is an essential DNA repair enzyme Loss of MGMT activity makes individuals susceptible to DNA damage and prone to tumor development Glioblastoma patient on alkylator chemotherapy: - Patients with MGMT promoter methylation show have longer PFS and OS with the use of alkylating agents as chemotherapy CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 134. MGMT Promoter Methylation Predicts Benefit form DNA-Alkylating Chemotherapy Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma patients show benefit for patients with MGMT promoter methylation Median Overall Survival 25 21.7 months 20 plus temozolomide 15 12.7 months 10 radiotherapy radiotherapy 5 Adapted from Hegi et al. NEJM 2005 0 352(10):1036-8. Non-Methylated Methylated Study with 207 patients MGMT Gene MGMT Gene CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 135. Genome-wide methylation by methylation sensitive restriction enzymes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 136. Genome-wide methylation by probes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 137. Genome-wide methylation …. by next generation sequencing # markers Discovery Verification Validation # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 138. MBD_Seq Condensed Chromatin DNA Sheared Immobilized Methyl Binding Domain DNA Sheared CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 139. MBD_Seq Immobilized Methyl binding domain MgCl2 Next Gen Sequencing GA Illumina: 100 million reads CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 140. MBD_Seq MGMT = dual core CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 141. Genome-wide methylation …. by next generation sequencing # markers 1-2 million MBD_Seq methylation cores Discovery # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 142. Data integration Correlation tracks expression expression Corr =-1 Corr = 1 methylation methylation CONFIDENTIAL 142
  • 143. Correlation track in GBM @ MGMT +1 -1 CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker 143 I NEXT-GEN | PharmacoDX |
  • 144. Genome-wide methylation …. by next generation sequencing # markers MBD_Seq Discovery 454_BT_Seq Verification MSP Validation # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX |
  • 145. Deep Sequencing unmethylated alleles methylated alleles less methylation more methylation CONFIDENTIAL GCATCGTGACTTACGACTGATCGATGGATGCTA
  • 146. Deep MGMT Heterogenic complexity CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 147. CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker 147 I NEXT-GEN | PharmacoDX | CRC
  • 148. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 149. Translational Medicine: An inconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes ?? • Space/time continuum …
  • 150. Translational Medicine: An inconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes … • ―space/time‖ continuum
  • 151.
  • 152.
  • 153. Cellular programming Epigenetic (meta)information = stem cells
  • 154. Cellular reprogramming Tumor Tumor Development and Growth Epigenetically altered, self- renewing cancer stem cells
  • 155. Cellular reprogramming Gene-specific Epigenetic reprogramming