https://www.youtube.com/watch?v=Y_-o-4rKxUk
Machine learning powered metabolomic network analysis
Dmitry Grapov PhD,
Director of Data Science and Bioinformatics,
CDS- Creative Data Solutions
www.createdatasol.com
Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com
2. Creative
Data
Solutions
Predictive Modeling Within a Biochemical Context
Grapov et. al., Circ. Cardiovasc. Genet. 2014
Personalized Medicine
Complex Data Integration
Grapov et. al.,PLoS ONE (2014) doi:10.1371/journal.pone.0084260
J. Proteome Res., 2015, 14 (1), pp 557–566 DOI: 10.1021/pr500782g
Biomarker Discovery
3. Creative
Data
SolutionsDOI: 10.1126/science.356.6338.646
‘The impact of metabolomics on systems
biology is still in its infancy… few people have
a solid understanding across all the 'omics,
including lipidomics and epigenetics.’
Present and Future
‘There is a need for a generalized data
analysis methods capable of efficiently
integrating across[and within] -Omic domains’
5. Creative
Data
Solutions
http://www.archaeology.org/issues/207-
1603/features/4157-arles-roman-wall-paintings
Materials of Connected Biological
Data Analysis and Visualization
Quality Assessment
• use replicated measurements
and/or internal standards to
estimate analytical variance
Statistical and Multivariate
• use the experimental design
to test hypotheses and/or
identify trends in analytes
Functional
• use statistical and multivariate
results to identify impacted
biochemical domains
Network and Predictive
• integrate statistical and
multivariate results with the
experimental design and
analyte metadata
10. Creative
Data
Solutions
Considerations for Selecting a Modeling Method
Is it appropriate for my data?
• Missing values
• Data types e.g. numeric/discrete
• Multicollinearity
• Fitting properties e.g. non-linear
Training and test performance?
• Optimization
• Parameter tuning
• Feature selection
• Feature weights/rank
https://en.wikipedia.org/wiki/Anscombe%27s_quartet
https://en.wikipedia.org/wiki/Overfitting
Very important to avoid
overfitting
non-linear
classificationoutlier
linear
Visualization is critical for model evaluation
11. Creative
Data
Solutions
Machine Learning Based Predictive Modeling
Auto or manual
parameter tuning
Optimize based on a variety
of performance metrics
Control model cross-
validation
Share results with
global state
Choose from over
200 model types
Calculate classification or
regression models