SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Metabolomics Data Analysis
                                         Johan A. Westerhuis
                 Swammerdam Institute for Life Sciences, University of Amsterdam
                                    Business Mathematics and Information,
                          North-West University, Potchefstroom, South Africa




                                                                               egra
SeqAhead, Barcelona February 2013
Metabolomics pipeline :
                               Issues for biostatistics



Biological                                       Data                            Statistical        Biological
               Experimental      Data                           Metabolite
question                                         Pre-                              Data               inter-
                  design       acquisition                     identification
                                              processing                          analysis          pretation

             Power analysis                  Normalisation                      Explorative
             Treatment                       Quantification                     Predictive
             design                                                             Hypothetical
                              QC strategy                                       biomarkers
                              Measurement                     Spectral                         Network
                              design                          matching                         inference,
                                                              De NOVO                          MSEA,
                                                              indentification                  Pathway
                                                                                               analysis


  3
Data Analysis
special issue Metabolomics
      • Data preprocessing methods (make samples
        more comparable)
      • How to treat non-detects
      • Variable importance in multivariate models
      • Metabolic network analysis
      • Data fusion methods
      • Individual responses
      • Between metabolite ratio’s


                                      Guest Editors
                                      Jeroen J. Jansen
                                      Johan A. Westerhuis
Multivariate metabolomics data
          NONTARGETED PROFILING                            TARGETED ANALYSIS




                                                 hipp    fum   urea   allant TMAO   citrat
1    67   45   6   3    31   10   44   32   10     3      1      8     7     13       4
3    24   12   4   33   23   0    0    99   76     5      2     12     6     15       2

    Technical correlations
                                                        Biological correlations
    Biological correlations
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment,              – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference
Metabolomics Data preprocessing

• Optimize biological content of data


• Correct for incorrect sampling, sample
  workup issues, batch effects
• What is the noise level in the data?      Generalized log
                                            transform
  Variance stabilization.
• High peaks more important than low
  peaks?
• Multivariate methods love large values!
                                  7
Metabolic changes during E. coli culture
growth using k-means clustering.       time




                                                                       metabolites
(A) Growth curve (optical density) of unperturbed E. coli culture. Numbers of
    respective sampling time points are marked in the curve. Time point 0 minutes
    marks the application of the respective stress condition.

(B) Relative changes of metabolites pools normalized time point 1. Fold change is
    presented on log10 scale. To reveal main trends of metabolic changes
   10 K means clusters are color coded.

                        Szymanski, Jedrzej et al. PLoS ONE (2009), vol. 4 issue. 10
Self Organising Map of Metabolites in serum

                                                                  1H NMR spectra of 613 patients
                                                                  with type I diabetes and a diverse
                                                                  spread of complications

                                                                  Nonlinear mapping method
                                                                  for large number of samples.

                                                                  Relate position on the map to
                                                                  diagnostic responses.

                                                                  Can be made supervised




1H NMR metabonomics approach to the disease continuum of diabetic complications and premature death
VP Mäkinen et al, Molecular Systems Biology 4:167, 2008
Multivariate
             Metabolomics Data analysis
• Explorative
    – Find groups, clusters structure /
      outliers in metabolites and in
      samples

• Supervised (Differentially expressed)
    – Discriminate two or more groups to
      make predictive model and to find    • Special topics
      biomarkers.
                                              – Between metabolite
                                                ratios
• Biological Interpretation
    – Metabolite set enrichment, Pathway
      analysis                                – Metabolomics Data
    – Metabolic network inference               Fusion
Supervised Metabolomics Data analysis
              Case – Control (PLSDA)
                                   Y           4

                                                                                               Men
                                               3
                                   0                                                           Women
                                               2
                                   0
                                               1
                                   0




                                       PC2
                                               0

                                   1          -1

                                   1          -2

                                   1          -3
                                                -4             -2         0           2            4         6
                                                                              PC1
                                             0.04

• Is there really a difference
  between the groups ?
                                             0.02



       Statistical validation issues            0


                                       PLS
                                       b
• Which are the most important               -0.02


  peaks for discrimination ?                 -0.04


       Variable importance                   -0.06
                                                     4   3.5        3    2.5     2       1.5   1       0.5       0
                                                                        Chemical shift (ppm)
• Psyhogios example uitleggen met paper
  voorbeelden en metaboanalyst voorbeelden




  Proton NMR spectra of the urine samples were obtained
             on a 500MHz 1H NMR machine.

                            13
NMR spectra of urine samples




             14
Nonsupervised




                     Supervised




UNIVERSITY OF
                15
AMSTERDAM
Experimental Design Example

Experiment:
Rats        are given Bromobenzene        that affects the liver


Measurements: NMR spectroscopy of urine                        Rats




Experimental Design:                           6 hours

                                              24 hours
 Time: 6, 24 and 48 hours                     48 hours

 Groups: 3 doses of BB                                                                         3.0275




 Vehicle group, Control group                                                                        2.055
                                                                                    5.38       3.285
                                                                                                3.0475
 Animals: 3 rats per dose per time                                                          3.675
                                                                                           3.7525
                                                                                                2.7175
                                                                                                    2.075
                                                                                                2.93
 point
                                                                   10     8     6          4        2        0
                                                                        chemical shift (ppm)
Different contributions
                                   Experimental Design
                                                                                                               Time

                             4

                           3.5                                          0   0.2   0.4 time 0.6       0.8   1
Metabolite concentration




                             3

                           2.5
                                                                                                               Dose
                             2

                           1.5

                             1
                                                                        0   0.2   0.4          0.6   0.8   1
                           0.5                                                          time

                             0

                           -0.5
                               0     0.2   0.4          0.6   0.8   1
                                                 time
                                                                                                                 Animal

                                    Trajectories                        0   0.2   0.4 time 0.6       0.8   1
ANOVA decomposition of each variable


         xhkihk             k            hk              hkihk
 4
3.5
 3
2.5
 2
1.5
 1
0.5
 0                       0   0.2 0.4 0.6 0.8 1
-0.5 0.2 0.4 0.6 0.8 1
 0                                               0   0.2 0.4 0.6 0.8 1   0 0.2 0.4 0.6 0.8 1




                         MATRICES:
                         X  1mT  X α  X αβ  X αβγ
ANOVA and PCA  ASCA

X  1m  Xα  Xαβ  Xαβγ
          T




                   Pα         Pαβ          Pαβγ

  X                                               E
              Tα        Tαβ         Tαβγ
                                                  Parts of the
                                                  data not
                                                  explained by
                                                  the
                                                  component
 X  1mT  TαPα  TαβPαβ  TαβγPαβγ  E
              T       T         T
                                                  models
Results


                             0.5                          control
                                                          vehicle
                             0.4                          low
      Xαβγ                                                medium
Xα                           0.3
                                                          high

                                        αβ -scores
     Xαβ            Scores   0.2

                             0.1

             40 %              0

                             -0.1

                             -0.2


                                    6       24                      48
                                           Time (Hours)
Results  biomarkers
                                          3.0475
                        5.38


                                 3.7525
                                  3.675
                                                                        Unique to the α submodel

α                                                                       Differences
                               3.9675           2.735
                                                            2.055
                                                                        between submodels
                                                 2.5425


                                                 2.5825
                                                2.6975
                                                        2.055
                                                                        Interesting for Biology

                                                        2.075
                                                                        Interesting for Statistics /
                                            2.91                        Diagnostics
αβ
                                          3.0275
                                            2.93



                               3.9675           2.735
                                                2.6975
                                                 2.5825


                                        3.285
                                        3.2625


                                                        2.075
                                            2.93

αβγ
                                          3.0475        2.055
                                   3.73
                                3.8875



                                                2.735
                                          3.0275




                                        3.285

      10   8      6             4                       2           0
               chemical shift (ppm)
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
   – Method comparison                         ratios
• Biological Interpretation
   – Metabolite set enrichment               – Metabolomics Data
   – Pathway analysis                          Fusion
   – Metabolic network inference
NONTARGETED
              SELDI measurements of serum samples of
              20 Gaucher patients and 20 healthy
              controls.

              Gaucher is a genetic disease in which a fatty
              substance (lipid) accumulates in cells and
              certain organs
• human urine and porcine cerebrospinal fluid
  samples spiked with a range of peptides.
• Variation in #samples, within and between
  group variation
Gaucher   Spiked
Feature selection methods RESULTS
• Complex nontargeted Gaucher profiling data with
  highly variable background and varying difference
  between case and control: Multivariate methods
  perform best.

• Spiked LCMS targeted data with less variation in
  effect size: univariate and semi-univariate methods
  are best in selecting biomarkers.
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment,              – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference
Biomarkers:

A: Univariate
B: Multivariate
C: Change in group correlation
BMR of green tea intervention study
         186 human subjects with abdominal obesity




Validation shows significant changes in BMR between placebo and green tea treatment
together with most important triacylglycerols TG28-29 and TG41-42.
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment               – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference
Plasma
Differences in blood metabolites due to aging
Aging biomarker metabolites in liver
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment               – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference
Special topic: Metabolic networks
      Biochemical Network vs Association Network




                                                                      Figure 7 Marginal correlation network for a set of metabolites in
                                                                      tomato. Volatiles in red, derivatized metabolites in yellow. Solid
                                                                      lines represent positive correlations, dashed lines negative ones.
                                                                      Thickness of line corresponds to magnitude of ...
Margriet M.W.B. Hendriks ,   Data-processing strategies for metabolomics studies, Trends in Analytical Chemistry, 20212
Metabolomics, 2005

                                                                                        Data from
                                                                                        Potato tubers




       Metabolic neighbors                       Do not participate in common reactions
      High correlation due to e.g. chemical equilibrium, mass conservation,..


“a systematic relationship between observed correlation
networks and the underlying biochemical pathways.”
Ralf Steuer: Observing and interpreting correlations in metabolomic networks, Bioinformatics, 2003
Metabolic Network Inference
Search for the link between metabolome data and underlying metabolic
networks.


                                         F   A           E             ??           F       A           E
                                         C           B                              C               B

                                             D                                              D

 As an example: can we distinguish healthy from diseased networks:

                                 C                           Glucose        A       B       C
     Glucose     A       B
                                         G                                                          G       G
                                                 G
                             D                                                          D
HEALTHY                                                  DISEASE
                     F                                                          F               E
                                     E

                     F                                                          F
From data to network
                                                            NETWORK
                                                            TOPOLOGY
Goal:                                      ?

                                           ?                DIRECTIONS




Problems:




  NOISE     MISSING METABOLITES
                                  HUGE AMOUNT OF POSSIBLE
                                  NETWORK STRUCTURES



                        40
Inference from static data
1. DATA COLLECTION                                                                                                                                                                                         2. SIMILARITY SCORE CALCULATION


                                                                                                                                                     2a. Relevance Networks                                                                                                         2b. Conditioned Networks
A. Enzymatic
Variability                                                                                   ALL POSSIBLE
                                                                                                                                                                Pearson Correlation (PC)                                                                                            Partial Pearson Correlation (PPC)
                                                                                              PAIRWISE
          0.6


                                                                                              INTERACTIONS                                                      (linear)                                                                                                            (linear)
         0.55




                                                                                                                                                                                                                                                    F
          0.5




                                                                                                                                                                                                                                                                A               E              F
                                                                                                                                                                                                                                                                                                    A       E
         0.45                                                                                                                                              2


          0.4
                                                                                                                                                          1.5



                                                                                                                                                                                                                                                                        B
         0.35




                                                                                                                                                                                                                                                                                                        B
                                                                                                                                                           1
                    100   200   300   400   500       600   700   800   900   1000


                                                                                                                                                          0.5                               5
                                                                                                                                                                                                                                                        C                                       C
                                                                                                          2                                                0                                4




B. Intrinsic Variability
           1                                                                                             1.5



                                                                                                          1
                                                                                                                                                                      0.2         0.4
                                                                                                                                                                                            3
                                                                                                                                                                                                    0.6         0.8
                                                                                                                                                                                                                                                            D                                       D
                                                                                                                                                                                            2
         0.9
                                                                                                         0.5                                                                                1

                                                                                                               5
         0.8                                                                                              0                                                                                 0
                                                                                                                                                                                                0               1               2       3       4
                                                                                                               4
                                                                                                               0.2     0.4       0.6       0.8

         0.7
                                                                                                               3



         0.6                                                                                                   2



                                                                                                               1

         0.5
                                                                                                               0
                                                                                                                                                                                                                                                F
                                                                                                                                                                                                                                                            A               E
                                                                                                                   0         1         2         3    4


     0   0.4
          50                            100
                                                                                                                                                                 2



                                                                                                                                                                1.5
                                                                                                                                                                                                                                                                                                    F
                0               2                 4                6                 8                                                                                                                                                                              B                                   A       E
                                                                                                                                                                 1
                                                                                                                                                                                                                                                    C
                                                                                                                                                                0.5

                                                                                                                                                                                                                                                                                                            B
                                                                                                                                                                                        5



                                                                                                                                                                 0                      4
                                                                                                                                                                                                                                                                                                    C
                                                                                                                                                                                                                                                            D
C. Environmental
                                                                                                                                                                            0.2             0.4           0.6         0.8
                                                                                                                                                                                        3



                                                                                                                                                                                        2



Variability                                                                                                                                                                             1
                                                                                                                                                                                                                                                                                                        D
                                                                                                                                                                                        0
                                                                                                                                                                                            0               1               2       3       4



                                                                                                                                                                Mutual Information (MI)                                                                                             Conditional Mutual Information
                                                                                                                                                                (non-linear)                                                                                                        (CMI) (non-linear)

 0       50                                 100
           10                   20                    30                40               50
ESTIMATION OF CORRELATION NETWORKS
        1. ASPP               2. ASA                3. HS                    4. HSP           Real Pathway


          Vmax Variability                     Intrinsic Variability                 Environmental Variability

 PC     ASPP   ASA       HS    HSP     PC   ASPP    ASA       HS       HSP     PC   ASPP   ASA      HS     HSP




 MI     ASPP   ASA      HS     HSP     MI   ASPP    ASA       HS       HSP     MI   ASPP   ASA     HS      HSP




                                      PPC1 ASPP                               PPC1 ASPP     ASA     HS     HSP
PPC1    ASPP    ASA      HS     HSP                  ASA      HS       HSP



CMI1    ASPP   ASA      HS     HSP    CMI1 ASPP     ASA       HS       HSP    CMI1 ASPP    ASA      HS     HSP




                                                                              PPCn ASPP    ASA      HS     HSP
PPCn    ASPP   ASA       HS    HSP    PPCn ASPP     ASA       HS       HSP


                                                                                                         100%
 PC: Pearson Correlation (linear measure)                                                                > 90%
 MI: Entropy-based Mutual Information (non-linear measure)                                               10% … 90%
 PPC: Partial Pearson Correlation (linear conditioning measure)                                          < 10%
 CMI: Conditional Mutual Information (nonlinear conditioning measure)

   42
         Cakir, Metabolomics 2009
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment               – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference
Metabolomics data fusion
• Account for between-block difference in quality of
  measurements to improve data fusion

• For example, multi-platform data fusion, with differences in
  quantification, (non) targeted, error structure


          Amino acids        Lipids



                                Fused data


• How to quantify the quality of measurements with many
  metabolites, and many samples?
Error model for 1 metabolite
                                QC sample ->   RSD
Standard Deviaton St.D



                                                        • Error models:
                                                         - RSD using 1 QC sample

                                                         - 2-component
                                                          using study samples
                                               M
                                                        • Good error description
                                                          - sufficient # samples
                         A                             - large -range
                                       study samples
                                                                    I
                                S
                                     Mean Intensity I
Figure of merit for data from 1 platform

                                                                            Median: F-50 = 0.1
St.D




           Var. 15
                        Var. 365                                      90th-percentile: F-90 = 0.35




                                                    Number of peaks
                              Var. 118

                                                                          F-50 F-90


                              Var. 213




                                           I
(Van Batenburg et al. Analytical Chemistry, 2011)
Two-step data fusion
               j             GC/MS                     LC/MS

          J1=
          82                                       J2= 49 peaks
          peaks
                       Ij


                          M                                      M
           
     • Step 1:
      Compute figures of merit for each platform
                                                 
Two-step data fusion: MB-MLPCA
 • Step 2 : Multi-block PCA with weighting by figures of merit

                                            Fused error
                                               covariance

             X1                    X2

     Amino acids          Lipids                      js
                                                      ˆ2
                  

                                      
 • Method needs good estimation of error variance by
     – Repeats
     – QC samples
Realistic simulations
    using GCMS and
       LCMS data

• Error variance estimated
  from duplicates

• True error variance

• Estimating variance from
  duplicates is problematic.
• Use Mix of QC samples and
  repeats.
Multivariate
           Metabolomics Data analysis
• Explorative
   – Find groups, clusters structure /
     outliers in metabolites and in
     samples

• Supervised
   – Discriminate two or more groups to
     make predictive model and to find    • Special topics
     biomarkers.                             – Between metabolite
                                               ratios
• Biological Interpretation
   – Metabolite set enrichment               – Metabolomics Data
     Pathway analysis                          Fusion
   – Metabolic network inference

Mais conteúdo relacionado

Mais procurados (20)

Proteomics
Proteomics   Proteomics
Proteomics
 
Mass spectrometry final.pptx
Mass spectrometry final.pptxMass spectrometry final.pptx
Mass spectrometry final.pptx
 
A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
proteomics
 proteomics proteomics
proteomics
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Target identification in drug discovery
Target identification in drug discoveryTarget identification in drug discovery
Target identification in drug discovery
 
Proteomics
ProteomicsProteomics
Proteomics
 
X ray crystallography
X ray crystallographyX ray crystallography
X ray crystallography
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro array
 
Protein ligand docking
Protein ligand dockingProtein ligand docking
Protein ligand docking
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
Systems biology
Systems biologySystems biology
Systems biology
 
Proteomics
ProteomicsProteomics
Proteomics
 

Destaque

Metabolomics: The Next Generation of Biochemistry
Metabolomics: The Next Generation of Biochemistry Metabolomics: The Next Generation of Biochemistry
Metabolomics: The Next Generation of Biochemistry Metabolon, Inc.
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
Metabolomics
MetabolomicsMetabolomics
MetabolomicsSUVO DAS
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreemanshreeman cs
 
Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Sophia Banton
 
Multivariate Chemical Space
Multivariate Chemical SpaceMultivariate Chemical Space
Multivariate Chemical SpaceJahan B Ghasemi
 
Metabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachesMetabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachestuxette
 

Destaque (20)

Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Metabolomics: The Next Generation of Biochemistry
Metabolomics: The Next Generation of Biochemistry Metabolomics: The Next Generation of Biochemistry
Metabolomics: The Next Generation of Biochemistry
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
0 introduction
0  introduction0  introduction
0 introduction
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Proteomics
Proteomics Proteomics
Proteomics
 
Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017
 
Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2
 
Multivariate Chemical Space
Multivariate Chemical SpaceMultivariate Chemical Space
Multivariate Chemical Space
 
Metabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approachesMetabolomic data: combining wavelet representation with learning approaches
Metabolomic data: combining wavelet representation with learning approaches
 

Semelhante a Metabolomics Data Analysis

Biotechnology as Career Option 2012
Biotechnology as Career Option 2012Biotechnology as Career Option 2012
Biotechnology as Career Option 2012Reportbioinformatics
 
Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27Sage Base
 
Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Sage Base
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-akyAmit Yadav
 
Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Stephen Friend National Heart Lung & Blood Institute 2011-07-19Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Stephen Friend National Heart Lung & Blood Institute 2011-07-19Sage Base
 
Consequences of Sequential Sampling for Meta-analysis
Consequences of Sequential Sampling for Meta-analysisConsequences of Sequential Sampling for Meta-analysis
Consequences of Sequential Sampling for Meta-analysisLorenzo Braschi Diaferia
 
Presentation Personalized Medicine Consortium
Presentation Personalized Medicine ConsortiumPresentation Personalized Medicine Consortium
Presentation Personalized Medicine ConsortiumEuroBioForum
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010ygc
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Sage Base
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyChris Evelo
 
Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Sage Base
 
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...Vall d'Hebron Institute of Research (VHIR)
 
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24Sage Base
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Stephen Friend SciLife 2011-09-20
Stephen Friend SciLife 2011-09-20Stephen Friend SciLife 2011-09-20
Stephen Friend SciLife 2011-09-20Sage Base
 
Friend AACR 2013-01-16
Friend AACR 2013-01-16Friend AACR 2013-01-16
Friend AACR 2013-01-16Sage Base
 

Semelhante a Metabolomics Data Analysis (20)

ALTRABio presents WikiBioPath: new perspectives in biological data analysis
ALTRABio presents WikiBioPath: new perspectives in biological data analysisALTRABio presents WikiBioPath: new perspectives in biological data analysis
ALTRABio presents WikiBioPath: new perspectives in biological data analysis
 
Biotechnology as Career Option 2012
Biotechnology as Career Option 2012Biotechnology as Career Option 2012
Biotechnology as Career Option 2012
 
Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27
 
Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend Institute for Cancer Research 2011-11-01
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky
 
Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Stephen Friend National Heart Lung & Blood Institute 2011-07-19Stephen Friend National Heart Lung & Blood Institute 2011-07-19
Stephen Friend National Heart Lung & Blood Institute 2011-07-19
 
Consequences of Sequential Sampling for Meta-analysis
Consequences of Sequential Sampling for Meta-analysisConsequences of Sequential Sampling for Meta-analysis
Consequences of Sequential Sampling for Meta-analysis
 
Presentation Personalized Medicine Consortium
Presentation Personalized Medicine ConsortiumPresentation Personalized Medicine Consortium
Presentation Personalized Medicine Consortium
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010
 
Lab presentation2011
Lab presentation2011Lab presentation2011
Lab presentation2011
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01
 
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...
Translational bioinformatics at VHIR: Understanding molecular damage in Fabry...
 
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Stephen Friend SciLife 2011-09-20
Stephen Friend SciLife 2011-09-20Stephen Friend SciLife 2011-09-20
Stephen Friend SciLife 2011-09-20
 
Friend AACR 2013-01-16
Friend AACR 2013-01-16Friend AACR 2013-01-16
Friend AACR 2013-01-16
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Mais de COST action BM1006

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisCOST action BM1006
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachCOST action BM1006
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationCOST action BM1006
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data IntegrationCOST action BM1006
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...COST action BM1006
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsCOST action BM1006
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsCOST action BM1006
 
Metabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality controlMetabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality controlCOST action BM1006
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration ChallengesCOST action BM1006
 

Mais de COST action BM1006 (11)

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network Approach
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data Integration
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
 
Metabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality controlMetabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality control
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration Challenges
 

Metabolomics Data Analysis

  • 1. Metabolomics Data Analysis Johan A. Westerhuis Swammerdam Institute for Life Sciences, University of Amsterdam Business Mathematics and Information, North-West University, Potchefstroom, South Africa egra SeqAhead, Barcelona February 2013
  • 2.
  • 3. Metabolomics pipeline : Issues for biostatistics Biological Data Statistical Biological Experimental Data Metabolite question Pre- Data inter- design acquisition identification processing analysis pretation Power analysis Normalisation Explorative Treatment Quantification Predictive design Hypothetical QC strategy biomarkers Measurement Spectral Network design matching inference, De NOVO MSEA, indentification Pathway analysis 3
  • 4. Data Analysis special issue Metabolomics • Data preprocessing methods (make samples more comparable) • How to treat non-detects • Variable importance in multivariate models • Metabolic network analysis • Data fusion methods • Individual responses • Between metabolite ratio’s Guest Editors Jeroen J. Jansen Johan A. Westerhuis
  • 5. Multivariate metabolomics data NONTARGETED PROFILING TARGETED ANALYSIS hipp fum urea allant TMAO citrat 1 67 45 6 3 31 10 44 32 10 3 1 8 7 13 4 3 24 12 4 33 23 0 0 99 76 5 2 12 6 15 2 Technical correlations Biological correlations Biological correlations
  • 6. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment, – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
  • 7. Metabolomics Data preprocessing • Optimize biological content of data • Correct for incorrect sampling, sample workup issues, batch effects • What is the noise level in the data? Generalized log transform Variance stabilization. • High peaks more important than low peaks? • Multivariate methods love large values! 7
  • 8.
  • 9. Metabolic changes during E. coli culture growth using k-means clustering. time metabolites (A) Growth curve (optical density) of unperturbed E. coli culture. Numbers of respective sampling time points are marked in the curve. Time point 0 minutes marks the application of the respective stress condition. (B) Relative changes of metabolites pools normalized time point 1. Fold change is presented on log10 scale. To reveal main trends of metabolic changes 10 K means clusters are color coded. Szymanski, Jedrzej et al. PLoS ONE (2009), vol. 4 issue. 10
  • 10. Self Organising Map of Metabolites in serum 1H NMR spectra of 613 patients with type I diabetes and a diverse spread of complications Nonlinear mapping method for large number of samples. Relate position on the map to diagnostic responses. Can be made supervised 1H NMR metabonomics approach to the disease continuum of diabetic complications and premature death VP Mäkinen et al, Molecular Systems Biology 4:167, 2008
  • 11. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised (Differentially expressed) – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment, Pathway analysis – Metabolomics Data – Metabolic network inference Fusion
  • 12. Supervised Metabolomics Data analysis Case – Control (PLSDA) Y 4 Men 3 0 Women 2 0 1 0 PC2 0 1 -1 1 -2 1 -3 -4 -2 0 2 4 6 PC1 0.04 • Is there really a difference between the groups ? 0.02 Statistical validation issues 0 PLS b • Which are the most important -0.02 peaks for discrimination ? -0.04 Variable importance -0.06 4 3.5 3 2.5 2 1.5 1 0.5 0 Chemical shift (ppm)
  • 13. • Psyhogios example uitleggen met paper voorbeelden en metaboanalyst voorbeelden Proton NMR spectra of the urine samples were obtained on a 500MHz 1H NMR machine. 13
  • 14. NMR spectra of urine samples 14
  • 15. Nonsupervised Supervised UNIVERSITY OF 15 AMSTERDAM
  • 16. Experimental Design Example Experiment: Rats are given Bromobenzene that affects the liver Measurements: NMR spectroscopy of urine Rats Experimental Design: 6 hours 24 hours Time: 6, 24 and 48 hours 48 hours Groups: 3 doses of BB 3.0275 Vehicle group, Control group 2.055 5.38 3.285 3.0475 Animals: 3 rats per dose per time 3.675 3.7525 2.7175 2.075 2.93 point 10 8 6 4 2 0 chemical shift (ppm)
  • 17. Different contributions Experimental Design Time 4 3.5 0 0.2 0.4 time 0.6 0.8 1 Metabolite concentration 3 2.5 Dose 2 1.5 1 0 0.2 0.4 0.6 0.8 1 0.5 time 0 -0.5 0 0.2 0.4 0.6 0.8 1 time Animal Trajectories 0 0.2 0.4 time 0.6 0.8 1
  • 18. ANOVA decomposition of each variable xhkihk    k   hk   hkihk 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 -0.5 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 MATRICES: X  1mT  X α  X αβ  X αβγ
  • 19. ANOVA and PCA  ASCA X  1m  Xα  Xαβ  Xαβγ T Pα Pαβ Pαβγ X E Tα Tαβ Tαβγ Parts of the data not explained by the component X  1mT  TαPα  TαβPαβ  TαβγPαβγ  E T T T models
  • 20. Results 0.5 control vehicle 0.4 low Xαβγ medium Xα 0.3 high αβ -scores Xαβ Scores 0.2 0.1 40 % 0 -0.1 -0.2 6 24 48 Time (Hours)
  • 21. Results  biomarkers 3.0475 5.38 3.7525 3.675 Unique to the α submodel α Differences 3.9675 2.735 2.055 between submodels 2.5425 2.5825 2.6975 2.055 Interesting for Biology 2.075 Interesting for Statistics / 2.91 Diagnostics αβ 3.0275 2.93 3.9675 2.735 2.6975 2.5825 3.285 3.2625 2.075 2.93 αβγ 3.0475 2.055 3.73 3.8875 2.735 3.0275 3.285 10 8 6 4 2 0 chemical shift (ppm)
  • 22. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite – Method comparison ratios • Biological Interpretation – Metabolite set enrichment – Metabolomics Data – Pathway analysis Fusion – Metabolic network inference
  • 23. NONTARGETED SELDI measurements of serum samples of 20 Gaucher patients and 20 healthy controls. Gaucher is a genetic disease in which a fatty substance (lipid) accumulates in cells and certain organs
  • 24. • human urine and porcine cerebrospinal fluid samples spiked with a range of peptides. • Variation in #samples, within and between group variation
  • 25. Gaucher Spiked
  • 26. Feature selection methods RESULTS • Complex nontargeted Gaucher profiling data with highly variable background and varying difference between case and control: Multivariate methods perform best. • Spiked LCMS targeted data with less variation in effect size: univariate and semi-univariate methods are best in selecting biomarkers.
  • 27. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment, – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
  • 28. Biomarkers: A: Univariate B: Multivariate C: Change in group correlation
  • 29. BMR of green tea intervention study 186 human subjects with abdominal obesity Validation shows significant changes in BMR between placebo and green tea treatment together with most important triacylglycerols TG28-29 and TG41-42.
  • 30. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
  • 32. Differences in blood metabolites due to aging
  • 34.
  • 35.
  • 36. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
  • 37. Special topic: Metabolic networks Biochemical Network vs Association Network Figure 7 Marginal correlation network for a set of metabolites in tomato. Volatiles in red, derivatized metabolites in yellow. Solid lines represent positive correlations, dashed lines negative ones. Thickness of line corresponds to magnitude of ... Margriet M.W.B. Hendriks , Data-processing strategies for metabolomics studies, Trends in Analytical Chemistry, 20212
  • 38. Metabolomics, 2005 Data from Potato tubers Metabolic neighbors Do not participate in common reactions High correlation due to e.g. chemical equilibrium, mass conservation,.. “a systematic relationship between observed correlation networks and the underlying biochemical pathways.” Ralf Steuer: Observing and interpreting correlations in metabolomic networks, Bioinformatics, 2003
  • 39. Metabolic Network Inference Search for the link between metabolome data and underlying metabolic networks. F A E ?? F A E C B C B D D  As an example: can we distinguish healthy from diseased networks: C Glucose A B C Glucose A B G G G G D D HEALTHY DISEASE F F E E F F
  • 40. From data to network NETWORK TOPOLOGY Goal: ? ? DIRECTIONS Problems: NOISE MISSING METABOLITES HUGE AMOUNT OF POSSIBLE NETWORK STRUCTURES 40
  • 41. Inference from static data 1. DATA COLLECTION 2. SIMILARITY SCORE CALCULATION 2a. Relevance Networks 2b. Conditioned Networks A. Enzymatic Variability ALL POSSIBLE Pearson Correlation (PC) Partial Pearson Correlation (PPC) PAIRWISE 0.6 INTERACTIONS (linear) (linear) 0.55 F 0.5 A E F A E 0.45 2 0.4 1.5 B 0.35 B 1 100 200 300 400 500 600 700 800 900 1000 0.5 5 C C 2 0 4 B. Intrinsic Variability 1 1.5 1 0.2 0.4 3 0.6 0.8 D D 2 0.9 0.5 1 5 0.8 0 0 0 1 2 3 4 4 0.2 0.4 0.6 0.8 0.7 3 0.6 2 1 0.5 0 F A E 0 1 2 3 4 0 0.4 50 100 2 1.5 F 0 2 4 6 8 B A E 1 C 0.5 B 5 0 4 C D C. Environmental 0.2 0.4 0.6 0.8 3 2 Variability 1 D 0 0 1 2 3 4 Mutual Information (MI) Conditional Mutual Information (non-linear) (CMI) (non-linear) 0 50 100 10 20 30 40 50
  • 42. ESTIMATION OF CORRELATION NETWORKS 1. ASPP 2. ASA 3. HS 4. HSP Real Pathway Vmax Variability Intrinsic Variability Environmental Variability PC ASPP ASA HS HSP PC ASPP ASA HS HSP PC ASPP ASA HS HSP MI ASPP ASA HS HSP MI ASPP ASA HS HSP MI ASPP ASA HS HSP PPC1 ASPP PPC1 ASPP ASA HS HSP PPC1 ASPP ASA HS HSP ASA HS HSP CMI1 ASPP ASA HS HSP CMI1 ASPP ASA HS HSP CMI1 ASPP ASA HS HSP PPCn ASPP ASA HS HSP PPCn ASPP ASA HS HSP PPCn ASPP ASA HS HSP 100% PC: Pearson Correlation (linear measure) > 90% MI: Entropy-based Mutual Information (non-linear measure) 10% … 90% PPC: Partial Pearson Correlation (linear conditioning measure) < 10% CMI: Conditional Mutual Information (nonlinear conditioning measure) 42 Cakir, Metabolomics 2009
  • 43. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
  • 44. Metabolomics data fusion • Account for between-block difference in quality of measurements to improve data fusion • For example, multi-platform data fusion, with differences in quantification, (non) targeted, error structure Amino acids Lipids Fused data • How to quantify the quality of measurements with many metabolites, and many samples?
  • 45. Error model for 1 metabolite QC sample -> RSD Standard Deviaton St.D • Error models: - RSD using 1 QC sample - 2-component using study samples M • Good error description - sufficient # samples A  - large -range study samples I S Mean Intensity I
  • 46. Figure of merit for data from 1 platform Median: F-50 = 0.1 St.D Var. 15 Var. 365 90th-percentile: F-90 = 0.35 Number of peaks Var. 118 F-50 F-90 Var. 213 I (Van Batenburg et al. Analytical Chemistry, 2011)
  • 47. Two-step data fusion j GC/MS LC/MS J1= 82 J2= 49 peaks peaks  Ij M M  • Step 1: Compute figures of merit for each platform  
  • 48. Two-step data fusion: MB-MLPCA • Step 2 : Multi-block PCA with weighting by figures of merit Fused error covariance X1 X2 Amino acids Lipids  js ˆ2    • Method needs good estimation of error variance by – Repeats – QC samples
  • 49. Realistic simulations using GCMS and LCMS data • Error variance estimated from duplicates • True error variance • Estimating variance from duplicates is problematic. • Use Mix of QC samples and repeats.
  • 50. Multivariate Metabolomics Data analysis • Explorative – Find groups, clusters structure / outliers in metabolites and in samples • Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios • Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference