Understanding Metabolomics Through Metabolite Profiling

Bioinformatics:
Metabolomics
By Mia Pras-Raves, PhD
Prof. dr. A.H.C. van Kampen (Antoine)
Bioinformatics Laboratory
Academic Medical Centre (AMC)
www.bioinformaticslaboratory.nl
m.pras@amc.uva.nl, a.h.vankampen@amc.uva.nl

Metabolomics - definition
3
 Scientific study of chemical processes involving metabolites
 Metabolites
 Intermediates and products of metabolic reactions
 Synthesis and breakdown of compounds
 Lipids, sugars, peptides, volatile compounds, ......
 Hormones and other signalling molecules
 Products of metabolism of foreign substances
 Exception: proteins (proteomics)
 usually < 1500 Da in size
 The metabolome represents the collection of all metabolites in a
biological cell, tissue, organ or organism, which are the end
products of cellular processes
 Very diverse and complex

Metabolomics – areas of application
 Multi-parametric metabolic response of living systems to
physiological stimuli or genetic modification
 Plants, animals, humans
 The global profiling of metabolites in various media:
 Biofluids (blood serum, urine, …)
 Cellular structures (cultured cells, organoids, …)
 Breath components
 Detection of biomarkers for diseases or drug- or diet-related
changes
 Metabolome is closest representation of phenotype
4

Metabolomics example: Barth syndrome
 First described in 1983 by
Dr. Barth
 X-linked recessive disorder
 Cardiac and skeletal myopathy
 Neutropenia
 Abnormal mitochondria
 3-methylglutaconic aciduria
 Can be lethal
 Barth syndrome is a
mitochondrial disorder
5

 Mitochondria are subcellular organelles
 Mitochondria have two membranes (inner and outer
membrane)
Mitochondria
6

 In Barth syndrome something is wrong with the
mitochondrial membrane
 Membranes are lipid bilayers and constitute of:
 Phospholipids
 Cholesterol
 Proteins
Membranes
7

Different phospholipids
 Basic phospholipid structure:
PC
PG PE
Head groups:
– Choline
– Ethanolamine
– Serine
– Inositol
– Glycerol
– …
Fatty acids

Fatty acids nomenclature
Saturated fatty acid
Unsaturated fatty acid
18 Carbon atoms, no double bonds
18:0
18 Carbon atoms, 1 double bond
18:1
Stearic acid
Oleic acid
9

Cardiolipin deficiency in Barth syndrome
CL = Cardiolipin

Structure of cardiolipin; a special phospholipid
 Basic phospholipid structure:
PG Cardiolipin
2 fatty acids 4 fatty acids

Cardiolipins
 Cardiolipin is an important component of the inner mitochondrial
membrane, where it constitutes about 20% of the total lipid composition.
 The name ‘cardiolipin’ is derived from the fact that it was first found in
animal hearts.
 In mammalian cells, cardiolipin is essential for the optimal function of
numerous enzymes that are involved in mitochondrial energy metabolism.
CL also has an important role in apoptosis.
R =
linoleic acid

Cardiolipin remodeling by tafazzin
MLCL =
MonoLyso CL
Remodeled CL
CL
Tafazzin
phospholipase

 In Barth syndrome, the remodeling process is disturbed and this
results in cardiolipin deficiency.
 Caused by mutation in TAZ (tafazzine) gene
 Monolysocardiolipins accumulate in Barth syndrome
Monolysocardiolipins
Houtkooper RH, Van Lenthe H, Stet FS, Wanders RJ, Kulik W, Vaz FM (2009) Analyse 64(7), 200

Cardiolipin in Barth syndrome heart
Control
Barth syndrome

Metabolomics – different methods
 Nuclear Magnetic Resonance Spectroscopy (NMR)
 Mass Spectroscopy (MS)
 Gas Chromatography MS (GC-MS)
 Liquid Chromatography MS (LC-MS)
 Direct Infusion MS (DI-MS)
 Lipidomics
 (phospho)lipids
 Fluxomics
 Incorporation of chemically labeled compound
16

Targeted vs untargeted metabolomics
 Targeted: Pre-defined set of metabolites to quantify
 Typically carried out in diagnostics
 Pros: Technically simple
 Cons: Limited scope, missing information
 Untargeted: Global analysis of metabolic changes in response to disease,
environmental or genetic perturbations.
 typically carried out for hypothesis generation, followed by targeted
profiling for more confident quantification of relevant metabolites.
 Pros: Unbiased (no selection of metabolites)
 Cons: Technically challenging (both the analysis and the bioinformatics),
risk of getting too many unknowns

Biological
Question
Experimental
Design
Sampling &
Sample
Preparation
Data
Aquisition
Data Pre-
Processing
Data
Analysis
Biological
Interpretation
Metabolites
Protocol Samples Raw data List of peaks/
Metabolites
Relevant
Metabolites &
Connectivities
The metabolomics workflow
18
Targeted vs
Untargeted?

Main analytical technologies
Separation
Gas chromatography (GC)
High-performance liquid
chromatography (HPLC/UPLC)
Detection
Nuclear Magnetic Resonance
(NMR) Spectroscopy
Mass spectroscopy (MS)
19

Liquid chromatography (LC)
 Analytical technique for separating ions or molecules that are in
solution.
 Sample solution is injected onto a column
 Compounds interact with the column material (solid phase)
 Solution is in contact with a liquid that is passed over the column
 the different compounds will interact with two phases to differing degrees,
due to differences in adsorption, ion-exchange, partitioning, or size.
 This interaction allows the mixture components to be separated
20

LC on a column
 Sample (black ink) contains three metabolites (blue, red, yellow)
 Due to interaction with the stationary phase, these metabolites
are separated
http://www.waters.com
Injected sample band (black)
Separated compounds (blue, red, yellow)
Time Zero
Mobile Phase
Time + 10 min.
Mobile Phase
21

HPLC: chromatogram
 A chromatogram is a representation of the separation that has chemically
[chromatographically] occurred in the HPLC system.
 A series of peaks rising from a baseline is drawn on a time axis.
 Each peak represents the detector response for a different compound.
http://www.waters.com
electric
signal

MS measures m/z of ions
 The MS principle consists of ionizing chemical compounds to generate
charged molecules or molecule fragments and measurement of their
mass-to-charge ratios (m/z).
e.g. Phenylalanine
H+

Mass Spectroscopy (MS)
mass spectrum
(m/z: for simplicity
assume z=1)
Measurement of
ion current

LC-MS data
Intensity
NUGO workshop 2007, TNO
25
3-dimensional space: time, mass (m/z), intensity
TIC
sum intensities of all m/z
values at this time point

LC-MS data
lipid plasma map (2D)

Biological
Question
Experimental
Design
Sampling &
Sample
Preparation
Data
Aquisition
Data Pre-
Processing
Data
Analysis
Biological
Interpretation
Metabolites
Metabolites
Relevant
Metabolites &
Connectivities
27
Targeted vs
Untargeted?

Overall workflow
Pre-processing
LC-MS data
Peak positions:
Identified compounds
- Quantities
Peak intensities
- Intensities
Peaks
Statistical
analysis
Multivariate statistical
analysis (PCA, etc)
Pathway
analysis
Systems
Biology

XCMS
 XCMS is an LC/MS-based data analysis approach which incorporates
several methods for pre-processing
 Smith et al (2006) Anal. Chem, 78, 779-787
 R package
 http://xcmsonline.scripps.edu/
29

XCMS: overview of preprocessing methodology
30
Filtering and identification of peaks
Match peaks across samples
Retention time
correction
Fill in missing peak data
Statistically analyze results
Visualization of peaks
Assignment of metabolite
names to peaks
Additions after XCMS:
• Noise peak filtering
• Identification of PLs
• Isotope correction
• Normalization on IS
• Statistical analysis

Peak detection (feature detection)
 Identify all signals caused by true ions and avoid detection of false
positives (i.e., noise, spikes)
 Provide accurate quantitative information about ion concentrations
 Slicing data to extracted ion chromatograms (EIC)
 covering a narrow m/z range (XCMS)
31

Peak detection can be difficult
 Due to peak shapes
 Due to artifacts
 White noise: Random background
 Flicker: changes in response with changing operation or conditions
 Interference: noise spikes of random occurrence and intensity
32

Matched Filtration: second derivative Gaussian model
 Filter each extracted ion chromatogram (EIC)
 Use second-derivative Gaussian as the model peak shape
 Set Full Width at Half-Maximum (fwhm)
 Second derivative Gaussian model
 Generates new chromatographic profile, which reflects curvature
rather than absolute intensity
 Implicit background subtraction
 Yields consistent s/n improvement
33

Derivatives of Gaussian model
34
2
2
2
2
1
( )
2
t
f t e 




Negative second derivative of Gaussian model
Negative second derivative
(smaller/sharper peak)
Gaussian model
35

Noise peak filtering
 Chromatographic peaks should have >3 consecutive points above noise
level
 Filter out 10-60% of peaks as noise

XCMS: overview of preprocessing methodology
37
Retention time
correction
names to peaks
Additions after XCMS:
•Noise peak filtering
•Identification of cpds
•Isotope correction
•Normalization on IS
•Statistical analysis

Peak matching
 The peak detection step was applied to individual samples.
 Next step: match these peaks across samples
 Allows comparison of samples for peak intensities
 Allows calculation and correction of retention time deviations
 The accuracy of mass spectrometers is often better than corresponding
retention time drifts from LC
 Make use of fixed-interval bins 0.25 m/z wide to match peaks in the mass
domain.
 To avoid splitting a group apart because of arbitrary bin borders, use
overlapping bins in which adjacent bins overlap by half (i.e., 100.0--
100.25, 100.125--100.375, etc.).
 During binning, each peak is counted twice in two overlapping bins.
 Use post-processing step to account for peak groups resulting from
overlapping bins
38

Matching m/z peaks of across samples
Overlapping bins
Sample 1
Sample 2
For each bin identify
the matching peaks
(independent of time!)
39

Peak group boundaries
Peak density /
Relative intensity
Seconds 
Mass bin: 337.975 – 338.225 m/z (Δm/z = 0.25)
meta-peak
(smoothed peak
density profile)
individual peaks of
different samples boundaries of
peak group
Gaussian kernel
with SD=30s
40

Smoothing affects number of peak groups found
Peak density /
Relative intensity
Mass bin: 337.975 – 338.225 m/z (Δm/z = 0.25)
Decreased smoothing
eliminates peak from
peak group
41

Why is time alignment necessary?
80 85 90 95
0
5
10
x 10
6
328
80 85 90 95
0
5
10
x 10
6
10
x 10
6
Extracted Ion Chromatograms (EICs) from 19 LC-MS runs
Original data
After time alignment
42

Retention time correction
 Method simultaneously corrects the retention times of all samples in a
single step.
 Depends on initially having a coarse matching of peaks into reasonable
groups.
 Peak matching typically results on hundreds of well-behaved peak groups
 very few samples have no peaks assigned
 very few samples have more than one peak assigned.
 Such well-behaved groups have a high probability of being properly
matched and can be used as references for alignment.
43

Retention time deviation profiles
 476 LC/MS analysis from serum samples
 Positive deviation: sample elutes after median retention time
 Negative deviation: sample elutes before median retention time
 Sample profiles are colored in a rainbow by the order in which they were
run, with red being the first samples and violet being the last samples run

Result: before and after retention time correction
45

 Determine which samples are missing from each peak group.
 Thus, no peak detected for those samples.
 Data incomplete
 Use information from peak detection about where peaks begin and end,
and aligned retention times for each sample
 Then integrate the raw LC/MS data to fill in intensity values for each of
the missing data points.
 A significant number of potential peaks can be missed during peak
detection.
 The step of filling in missing peak data is necessary for robust statistical
analysis.
46

Peak list from XCMS
 Peaks identified by their m/z and rt
 Which compound?
47

Compound Identification (HMDB)
48
 Go to http://www.hmdb.ca/
 Menu option: Search
 Choose: MS search
 Input mass 299.19 (mass_min=299.04&mass_max=299.34)
 Ionization Neutral
 Or input mass 300.19, Ionization Positive mode
 Molecular Weight Tolerance: 0.1 Da
 Entry 22 is an ethanol amine

Use experience
lipid plasma map (2D)

Identification of phospholipids
 Classes of PLs cluster together
m/z
RT

Patterns in CL peaks
 Same length of side chains
 same number of double bonds

Natural abundances of atoms
Name Symbol atomic mass % Abundance
Hydrogen 1H 1.0078 99.989
Deuterium 2H 2.0141 0.115
Carbon 12C 12.0000 98.930
Carbon 13C 13.0034 1.070
Nitrogen 14N 14.0031 99.632
Nitrogen 15N 15.0001 0.368
Oxygen 16O 15.9949 99.757
Oxygen 17O 16.9991 0.038
Oxygen 18O 17.9992 0.205
Phosphorus 31P 30.9738 100.000
CO2: 98.5% mass 44
1.1% mass 45
0.4% mass 46
Relative peak areas:
100%
1.2%
0.4%

Isotope correction for quantification
100%
49.1%
13.3%
2.8%
0.4%
100%
49.1%
13.3%
2.8%
0.4%
 Phospholipids:
 e.g. PhosphoCholine (36:4) : PO8NC44H80
 add PhosphoCholine (36:3) : PO8NC44H82

Isotope correction for quantification
 Substract
red intensity
to obtain
black intensity
 Unique, correct
identification
is crucial!
C77 H142 O17 P2: C77 H142 O17 P2 p(gss, s/p:10) Chrg...
700.0 700.5 701.0 701.5 702.0 702.5 703.0 703.5 704.0
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
a
t
i
v
e
Abundance
090902jpm07 #194-199 RT: 6.72-6.90 AV: 6 SM: 7G NL: 3.27E6
T: - p ESIsid=10.00 Q3MS [380.000-1100.000]
698.0 698.5 699.0 699.5 700.0 700.5 701.0 701.5 702.0 702.5 703.0
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relative
Abundance
100 85.8 39.8 13.2
P2O17C73H140
P2O17C73H142

Normalization on Internal Standard
 Internal Standards: Known quantity
 Same chemical class
 Same behavior
 Adjust for drift
in intensities
Correct all intensities
of peaks in same class
55

Statistical analysis of peaks
 XCMS: T-test. Comparison of two conditions

Multivariate statistics
 Multiplicity problem
 Too many false positives
 FDR
 PCA
 PLS
 DA
 kNN
 cross-validation
 Data are not independent!
 Data are not normally distributed
 Transform?
p < 0.05
57

Now you are ready for down-stream analysis!
Statistical
analysis
Multivariate statistical
analysis (PCA, etc)
Pathway
analysis
Systems
Biology
Peak Table
Identified compounds,
(relative) concentrations
59
Retention time
correction
names to peaks

Biological
Question
Experimental
Design
Sampling &
Sample
Preparation
Data
Aquisition
Data Pre-
Processing
Data
Analysis
Biological
Interpretation
Metabolites
Metabolites
Relevant
Metabolites &
Connectivities
60
Targeted vs
Untargeted?

Understanding Metabolomics Through Metabolite Profiling

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Understanding Metabolomics Through Metabolite Profiling

Semelhante a Understanding Metabolomics Through Metabolite Profiling (20)

Último

Último (20)

Understanding Metabolomics Through Metabolite Profiling