The document provides an overview of metabolomics and describes key aspects of the metabolomics workflow and analytical techniques. It defines metabolomics as the study of metabolites in biological systems and discusses areas of application including biomarker discovery. Metabolite profiling using techniques like NMR, GC-MS and LC-MS is described. The document uses Barth syndrome, a mitochondrial disorder, as a case study and discusses how a cardiolipin deficiency can be detected using metabolomics. It outlines the data processing steps in metabolomics including peak detection, matching, retention time correction, and compound identification.
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Understanding Metabolomics Through Metabolite Profiling
1. Bioinformatics:
Metabolomics
By Mia Pras-Raves, PhD
Prof. dr. A.H.C. van Kampen (Antoine)
Bioinformatics Laboratory
Academic Medical Centre (AMC)
www.bioinformaticslaboratory.nl
m.pras@amc.uva.nl, a.h.vankampen@amc.uva.nl
2.
3. Metabolomics - definition
3
Scientific study of chemical processes involving metabolites
Metabolites
Intermediates and products of metabolic reactions
Synthesis and breakdown of compounds
Lipids, sugars, peptides, volatile compounds, ......
Hormones and other signalling molecules
Products of metabolism of foreign substances
Exception: proteins (proteomics)
usually < 1500 Da in size
The metabolome represents the collection of all metabolites in a
biological cell, tissue, organ or organism, which are the end
products of cellular processes
Very diverse and complex
4. Metabolomics – areas of application
Multi-parametric metabolic response of living systems to
physiological stimuli or genetic modification
Plants, animals, humans
The global profiling of metabolites in various media:
Biofluids (blood serum, urine, …)
Cellular structures (cultured cells, organoids, …)
Breath components
Detection of biomarkers for diseases or drug- or diet-related
changes
Metabolome is closest representation of phenotype
4
5. Metabolomics example: Barth syndrome
First described in 1983 by
Dr. Barth
X-linked recessive disorder
Cardiac and skeletal myopathy
Neutropenia
Abnormal mitochondria
3-methylglutaconic aciduria
Can be lethal
Barth syndrome is a
mitochondrial disorder
5
6. Mitochondria are subcellular organelles
Mitochondria have two membranes (inner and outer
membrane)
Mitochondria
6
7. In Barth syndrome something is wrong with the
mitochondrial membrane
Membranes are lipid bilayers and constitute of:
Phospholipids
Cholesterol
Proteins
Membranes
7
8. Different phospholipids
Basic phospholipid structure:
PC
PG PE
Head groups:
– Choline
– Ethanolamine
– Serine
– Inositol
– Glycerol
– …
Fatty acids
11. Structure of cardiolipin; a special phospholipid
Basic phospholipid structure:
PG Cardiolipin
2 fatty acids 4 fatty acids
12. Cardiolipins
Cardiolipin is an important component of the inner mitochondrial
membrane, where it constitutes about 20% of the total lipid composition.
The name ‘cardiolipin’ is derived from the fact that it was first found in
animal hearts.
In mammalian cells, cardiolipin is essential for the optimal function of
numerous enzymes that are involved in mitochondrial energy metabolism.
CL also has an important role in apoptosis.
R =
linoleic acid
14. In Barth syndrome, the remodeling process is disturbed and this
results in cardiolipin deficiency.
Caused by mutation in TAZ (tafazzine) gene
Monolysocardiolipins accumulate in Barth syndrome
Monolysocardiolipins
Houtkooper RH, Van Lenthe H, Stet FS, Wanders RJ, Kulik W, Vaz FM (2009) Analyse 64(7), 200
16. Metabolomics – different methods
Nuclear Magnetic Resonance Spectroscopy (NMR)
Mass Spectroscopy (MS)
Gas Chromatography MS (GC-MS)
Liquid Chromatography MS (LC-MS)
Direct Infusion MS (DI-MS)
Lipidomics
(phospho)lipids
Fluxomics
Incorporation of chemically labeled compound
16
17. Targeted vs untargeted metabolomics
Targeted: Pre-defined set of metabolites to quantify
Typically carried out in diagnostics
Pros: Technically simple
Cons: Limited scope, missing information
Untargeted: Global analysis of metabolic changes in response to disease,
environmental or genetic perturbations.
typically carried out for hypothesis generation, followed by targeted
profiling for more confident quantification of relevant metabolites.
Pros: Unbiased (no selection of metabolites)
Cons: Technically challenging (both the analysis and the bioinformatics),
risk of getting too many unknowns
20. Liquid chromatography (LC)
Analytical technique for separating ions or molecules that are in
solution.
Sample solution is injected onto a column
Compounds interact with the column material (solid phase)
Solution is in contact with a liquid that is passed over the column
the different compounds will interact with two phases to differing degrees,
due to differences in adsorption, ion-exchange, partitioning, or size.
This interaction allows the mixture components to be separated
20
21. LC on a column
Sample (black ink) contains three metabolites (blue, red, yellow)
Due to interaction with the stationary phase, these metabolites
are separated
http://www.waters.com
Injected sample band (black)
Separated compounds (blue, red, yellow)
Time Zero
Mobile Phase
Time + 10 min.
Mobile Phase
21
22. HPLC: chromatogram
A chromatogram is a representation of the separation that has chemically
[chromatographically] occurred in the HPLC system.
A series of peaks rising from a baseline is drawn on a time axis.
Each peak represents the detector response for a different compound.
http://www.waters.com
electric
signal
23. MS measures m/z of ions
The MS principle consists of ionizing chemical compounds to generate
charged molecules or molecule fragments and measurement of their
mass-to-charge ratios (m/z).
e.g. Phenylalanine
H+
25. LC-MS data
Intensity
NUGO workshop 2007, TNO
25
3-dimensional space: time, mass (m/z), intensity
TIC
sum intensities of all m/z
values at this time point
28. Overall workflow
Pre-processing
LC-MS data
Peak positions:
Identified compounds
- Quantities
Peak intensities
- Intensities
Peaks
Statistical
analysis
Multivariate statistical
analysis (PCA, etc)
Pathway
analysis
Systems
Biology
29. XCMS
XCMS is an LC/MS-based data analysis approach which incorporates
several methods for pre-processing
Smith et al (2006) Anal. Chem, 78, 779-787
R package
http://xcmsonline.scripps.edu/
29
30. XCMS: overview of preprocessing methodology
30
Filtering and identification of peaks
Match peaks across samples
Retention time
correction
Fill in missing peak data
Statistically analyze results
Visualization of peaks
Assignment of metabolite
names to peaks
Additions after XCMS:
• Noise peak filtering
• Identification of PLs
• Isotope correction
• Normalization on IS
• Statistical analysis
31. Peak detection (feature detection)
Identify all signals caused by true ions and avoid detection of false
positives (i.e., noise, spikes)
Provide accurate quantitative information about ion concentrations
Slicing data to extracted ion chromatograms (EIC)
covering a narrow m/z range (XCMS)
31
32. Peak detection can be difficult
Due to peak shapes
Due to artifacts
White noise: Random background
Flicker: changes in response with changing operation or conditions
Interference: noise spikes of random occurrence and intensity
32
33. Matched Filtration: second derivative Gaussian model
Filter each extracted ion chromatogram (EIC)
Use second-derivative Gaussian as the model peak shape
Set Full Width at Half-Maximum (fwhm)
Second derivative Gaussian model
Generates new chromatographic profile, which reflects curvature
rather than absolute intensity
Implicit background subtraction
Yields consistent s/n improvement
33
35. Negative second derivative of Gaussian model
Negative second derivative
(smaller/sharper peak)
Gaussian model
35
36. Noise peak filtering
Chromatographic peaks should have >3 consecutive points above noise
level
Filter out 10-60% of peaks as noise
37. XCMS: overview of preprocessing methodology
37
Filtering and identification of peaks
Match peaks across samples
Retention time
correction
Fill in missing peak data
Statistically analyze results
Visualization of peaks
Assignment of metabolite
names to peaks
Additions after XCMS:
•Noise peak filtering
•Identification of cpds
•Isotope correction
•Normalization on IS
•Statistical analysis
38. Peak matching
The peak detection step was applied to individual samples.
Next step: match these peaks across samples
Allows comparison of samples for peak intensities
Allows calculation and correction of retention time deviations
The accuracy of mass spectrometers is often better than corresponding
retention time drifts from LC
Make use of fixed-interval bins 0.25 m/z wide to match peaks in the mass
domain.
To avoid splitting a group apart because of arbitrary bin borders, use
overlapping bins in which adjacent bins overlap by half (i.e., 100.0--
100.25, 100.125--100.375, etc.).
During binning, each peak is counted twice in two overlapping bins.
Use post-processing step to account for peak groups resulting from
overlapping bins
38
39. Matching m/z peaks of across samples
Overlapping bins
Sample 1
Sample 2
For each bin identify
the matching peaks
(independent of time!)
39
40. Peak group boundaries
Peak density /
Relative intensity
Seconds
Mass bin: 337.975 – 338.225 m/z (Δm/z = 0.25)
meta-peak
(smoothed peak
density profile)
individual peaks of
different samples boundaries of
peak group
Gaussian kernel
with SD=30s
40
41. Smoothing affects number of peak groups found
Peak density /
Relative intensity
Mass bin: 337.975 – 338.225 m/z (Δm/z = 0.25)
Decreased smoothing
eliminates peak from
peak group
41
42. Why is time alignment necessary?
80 85 90 95
0
5
10
x 10
6
328
80 85 90 95
0
5
10
x 10
6
10
x 10
6
Extracted Ion Chromatograms (EICs) from 19 LC-MS runs
Original data
After time alignment
42
43. Retention time correction
Method simultaneously corrects the retention times of all samples in a
single step.
Depends on initially having a coarse matching of peaks into reasonable
groups.
Peak matching typically results on hundreds of well-behaved peak groups
very few samples have no peaks assigned
very few samples have more than one peak assigned.
Such well-behaved groups have a high probability of being properly
matched and can be used as references for alignment.
43
44. Retention time deviation profiles
476 LC/MS analysis from serum samples
Positive deviation: sample elutes after median retention time
Negative deviation: sample elutes before median retention time
Sample profiles are colored in a rainbow by the order in which they were
run, with red being the first samples and violet being the last samples run
46. Fill in missing peak data
Determine which samples are missing from each peak group.
Thus, no peak detected for those samples.
Data incomplete
Use information from peak detection about where peaks begin and end,
and aligned retention times for each sample
Then integrate the raw LC/MS data to fill in intensity values for each of
the missing data points.
A significant number of potential peaks can be missed during peak
detection.
The step of filling in missing peak data is necessary for robust statistical
analysis.
46
47. Peak list from XCMS
Peaks identified by their m/z and rt
Which compound?
47
48. Compound Identification (HMDB)
48
Go to http://www.hmdb.ca/
Menu option: Search
Choose: MS search
Input mass 299.19 (mass_min=299.04&mass_max=299.34)
Ionization Neutral
Or input mass 300.19, Ionization Positive mode
Molecular Weight Tolerance: 0.1 Da
Entry 22 is an ethanol amine
55. Normalization on Internal Standard
Internal Standards: Known quantity
Same chemical class
Same behavior
Adjust for drift
in intensities
Correct all intensities
of peaks in same class
55
57. Multivariate statistics
Multiplicity problem
Too many false positives
FDR
PCA
PLS
DA
kNN
cross-validation
Data are not independent!
Data are not normally distributed
Transform?
p < 0.05
57
59. Now you are ready for down-stream analysis!
Statistical
analysis
Multivariate statistical
analysis (PCA, etc)
Pathway
analysis
Systems
Biology
Peak Table
Identified compounds,
(relative) concentrations
59
Filtering and identification of peaks
Match peaks across samples
Retention time
correction
Fill in missing peak data
Statistically analyze results
Visualization of peaks
Assignment of metabolite
names to peaks