SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Méthodologies d’intégration de données omiques
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
PEPR Santé Numérique, December 13th, 2023
“Méthodologies d’intégration de données omiques”
From the title, what you think the
presentation is about...
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 2
“Méthodologies d’intégration de données omiques”
From the title, what you think the
presentation is about...
... versus what it actually turns out to be:
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 2
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 3
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
[Foissac et al., 2019]
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 3
Omics data integration
Type of data to integrate
vertical:
horizontal:
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 4
Omics data integration
Type of data to integrate
vertical:
horizontal:
Type of analysis to perform
supervised:
unsupervised:
Left pictures courtesy Harold Duruflé
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 4
Multiple table analyses
(CCA, MFA, PLS, STATIS, ...)
Image from https://dimensionless.in
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 5
Types of data integration methods [Ritchie et al., 2015]
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 6
Want to know them all?
https://github.com/mikelove/awesome-multi-omics
Some specificities: can account for structure in data (network), are dedicated to a
specific omic (single-cell), can account for temporal/spatial information, can include
biological knowledge (mostly GO), ...
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 7
Scope of the rest of the talk
Unsupervised transformation based integration
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 8
Overview of the talk
Kernel methods
Integrating data with kernels
Conclusion, perspectives
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 9
Main ideas behind kernel methods
Standard (omics) data analyses:
▶ data are (numeric) tables
▶ analyses are based on operations (distances, means, ...) in the variable space
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 10
Main ideas behind kernel methods
Kernel data analyses:
▶ data are arbitrary
▶ analyses are based on transformations of data to “similarities” between samples
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 11
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 12
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 12
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 12
Principles of learning from kernels
Start from any statistical method (PCA, regression, k-means clustering) and rewrite all
quantities using:
▶ K to compute distances and dot products in the feature space:
▶ dot product: ⟨ϕ(xi ), ϕ(xi′ )⟩ = Kii′
▶ distance: ∥ϕ(xi ) − ϕ(xi′ )∥ =
p
⟨ϕ(xi ) − ϕ(xi′ ), ϕ(xi ) − ϕ(xi′ )⟩ =
√
Kii + Ki′i′ − 2Kii′
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 13
Principles of learning from kernels
Start from any statistical method (PCA, regression, k-means clustering) and rewrite all
quantities using:
▶ K to compute distances and dot products in the feature space:
▶ dot product: ⟨ϕ(xi ), ϕ(xi′ )⟩ = Kii′
▶ distance: ∥ϕ(xi ) − ϕ(xi′ )∥ =
p
⟨ϕ(xi ) − ϕ(xi′ ), ϕ(xi ) − ϕ(xi′ )⟩ =
√
Kii + Ki′i′ − 2Kii′
▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved
elements (centers of gravity and so on...)
e.g., representer of a cluster (k-means): g = 1
|C|
P
i∈C ϕ(xi )
kernel trick: d2(g, ϕ(xℓ)) = Kℓ,ℓ + 1
|C|2
P
i,i′∈C Kii′ − 2 1
|C|
P
i∈C Ki,ℓ
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 13
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 14
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
2. nodes of a graph: [Kondor and Lafferty, 2002]
3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or
convolution kernel [Saigo et al., 2004])
4. kernel between graphs (used between metabolites based on their fragmentation
trees): [Shen et al., 2014, Brouard et al., 2016]
5. kernel embedding phylogeny information for metagenomics
[Mariette and Villa-Vialaneix, 2018]
6. ...
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 14
Overview of the talk
Kernel methods
Integrating data with kernels
Conclusion, perspectives
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 15
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 16
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 16
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
▶ supervised framework: K∗ =
P
m βmKm with βm ≥ 0 and
P
m βm = 1 with βm
chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011]
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 16
Combining kernels in an unsupervised setting
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 17
Multiple kernel integration
Ideas of kernel consensus: find a kernel that performs a consensus of all kernels
[Mariette and Villa-Vialaneix, 2018] - R package mixKernel
with consensus based on:
▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994]
▶ criterion that preserves local geometry
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 18
Integrating TARA Oceans datasets
▶ For all compositional datasets, include
phylogenetic information (rather than
CLR and alike): weighted Unifrac
distance
▶ Perform PCA (could have been
clustering, linear model, . . . ) in the
feature space.
+ combine with a shuffling approach
to identify most influencing variables
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 19
Application to TARA oceans
Main facts
▶ Oceans typology related to longitude
▶ Rhizaria abundance structure the differences, especially between Arctic Oceans
and Pacific Oceans
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 20
Overview of the talk
Kernel methods
Integrating data with kernels
Conclusion, perspectives
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 21
Making kernel methods more interpretable
Which features are important? (for numerical
features only)
[Brouard et al., 2022] and
mixKernel
▶ extension to the unsupervised framework of the
work [Allen, 2013, Grandvalet and Canu, 2002]
▶ also extension to the kernel output framework
(time series, graph, ... outputs)
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 22
Making integration methods more available to biologists
http://asterics.miat.inrae.fr
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 23
Toward joint signature extraction with NMF
1
2
2
X
j=1
∥X(j)
− WH(j)
∥2
F
| {z }
goodness - of-fit
+
2
X
j=1
λ ∥H(j)
∥1
| {z }
sparsity
+
µ
2
∥W∥2
F
| {z }
regularization
+
γ
2
2
X
j=1
∥y − X(j)
H(j) ⊤
β(j)
∥2
2
| {z }
LDA
(1)
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 24
Future needs for data integration
▶ improve interpretability of methods (integrate more biological knowledge)
▶ reduce computational needs to achieve the challenge of a more sustainable
research
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 25
References
(unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau:
https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Allen, G. I. (2013).
Automatic feature selection via weighted kernels and regularization.
Journal of Computational and Graphical Statistics, 22(2):284–299.
Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022).
Feature selection for kernel methods in systems biology.
NAR Genomics and Bioinformatics, 4(1):lqac014.
Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016).
Fast metabolite identification with input output kernel regression.
Bioinformatics, 32(12):i28–i36.
Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau,
C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S.,
Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S.,
Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019).
Multi-species annotation of transcriptome and chromatin structure in domesticated animals.
BMC Biology, 17:108.
Gönen, M. and Alpaydin, E. (2011).
Multiple kernel learning algorithms.
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 25
Journal of Machine Learning Research, 12:2211–2268.
Grandvalet, Y. and Canu, S. (2002).
Adaptive scaling for feature selection in SVMs.
In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages
569–576. MIT Press.
Jaakkola, T., Diekhans, M., and Haussler, D. (2000).
A discriminative framework for detecting remote protein homologies.
Journal of Computational Biology, 7(1-2):95–114.
Kondor, R. I. and Lafferty, J. (2002).
Diffusion kernels on graphs and other discrete structures.
In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney,
Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Thèse de troisième cycle.
Mariette, J. and Villa-Vialaneix, N. (2018).
Unsupervised multiple kernel learning for heterogeneous data integration.
Bioinformatics, 34(6):1009–1015.
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015).
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 25
Methods of integrating data to uncover genotype-phenotype interactions.
Nature Reviews Genetics, 16(2):85–97.
Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004).
Protein homology detection using string alignment kernels.
Bioinformatics, 20(11):1682–1689.
Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014).
Metabolite identification through multiple kernel learning on fragmentation trees.
Bioinformatics, 30(12):i157–i64.
Multi-omics data integration methods
2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix
p. 25

Mais conteúdo relacionado

Semelhante a Méthodologies d'intégration de données omiques

MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING VisionGEOMATIQUE2014
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksNTNU
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageWilliam de Vazelhes
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 

Semelhante a Méthodologies d'intégration de données omiques (20)

MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
 
0-introduction.pdf
0-introduction.pdf0-introduction.pdf
0-introduction.pdf
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Data mining
Data mining Data mining
Data mining
 
Interactive Learning of Bayesian Networks
Interactive Learning of Bayesian NetworksInteractive Learning of Bayesian Networks
Interactive Learning of Bayesian Networks
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Project PPT
Project PPTProject PPT
Project PPT
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible package
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 

Mais de tuxette

Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
La famille *down
La famille *downLa famille *down
La famille *downtuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 

Mais de tuxette (20)

Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
La famille *down
La famille *downLa famille *down
La famille *down
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 

Último

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 

Último (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 

Méthodologies d'intégration de données omiques

  • 1. Méthodologies d’intégration de données omiques Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu PEPR Santé Numérique, December 13th, 2023
  • 2. “Méthodologies d’intégration de données omiques” From the title, what you think the presentation is about... Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 2
  • 3. “Méthodologies d’intégration de données omiques” From the title, what you think the presentation is about... ... versus what it actually turns out to be: Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 2
  • 4. Collected data at genomic level are increasingly publicly available the different levels are not always compatible Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 3
  • 5. Collected data at genomic level are increasingly publicly available the different levels are not always compatible [Foissac et al., 2019] Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 3
  • 6. Omics data integration Type of data to integrate vertical: horizontal: Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 4
  • 7. Omics data integration Type of data to integrate vertical: horizontal: Type of analysis to perform supervised: unsupervised: Left pictures courtesy Harold Duruflé Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 4
  • 8. Multiple table analyses (CCA, MFA, PLS, STATIS, ...) Image from https://dimensionless.in Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 5
  • 9. Types of data integration methods [Ritchie et al., 2015] Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 6
  • 10. Want to know them all? https://github.com/mikelove/awesome-multi-omics Some specificities: can account for structure in data (network), are dedicated to a specific omic (single-cell), can account for temporal/spatial information, can include biological knowledge (mostly GO), ... Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 7
  • 11. Scope of the rest of the talk Unsupervised transformation based integration Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 8
  • 12. Overview of the talk Kernel methods Integrating data with kernels Conclusion, perspectives Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 9
  • 13. Main ideas behind kernel methods Standard (omics) data analyses: ▶ data are (numeric) tables ▶ analyses are based on operations (distances, means, ...) in the variable space Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 10
  • 14. Main ideas behind kernel methods Kernel data analyses: ▶ data are arbitrary ▶ analyses are based on transformations of data to “similarities” between samples Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 11
  • 15. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 12
  • 16. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 12
  • 17. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩ Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 12
  • 18. Principles of learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: ▶ K to compute distances and dot products in the feature space: ▶ dot product: ⟨ϕ(xi ), ϕ(xi′ )⟩ = Kii′ ▶ distance: ∥ϕ(xi ) − ϕ(xi′ )∥ = p ⟨ϕ(xi ) − ϕ(xi′ ), ϕ(xi ) − ϕ(xi′ )⟩ = √ Kii + Ki′i′ − 2Kii′ Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 13
  • 19. Principles of learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: ▶ K to compute distances and dot products in the feature space: ▶ dot product: ⟨ϕ(xi ), ϕ(xi′ )⟩ = Kii′ ▶ distance: ∥ϕ(xi ) − ϕ(xi′ )∥ = p ⟨ϕ(xi ) − ϕ(xi′ ), ϕ(xi ) − ϕ(xi′ )⟩ = √ Kii + Ki′i′ − 2Kii′ ▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved elements (centers of gravity and so on...) e.g., representer of a cluster (k-means): g = 1 |C| P i∈C ϕ(xi ) kernel trick: d2(g, ϕ(xℓ)) = Kℓ,ℓ + 1 |C|2 P i,i′∈C Kii′ − 2 1 |C| P i∈C Ki,ℓ Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 13
  • 20. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 14
  • 21. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 2. nodes of a graph: [Kondor and Lafferty, 2002] 3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or convolution kernel [Saigo et al., 2004]) 4. kernel between graphs (used between metabolites based on their fragmentation trees): [Shen et al., 2014, Brouard et al., 2016] 5. kernel embedding phylogeny information for metagenomics [Mariette and Villa-Vialaneix, 2018] 6. ... Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 14
  • 22. Overview of the talk Kernel methods Integrating data with kernels Conclusion, perspectives Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 15
  • 23. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 16
  • 24. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 16
  • 25. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km ▶ supervised framework: K∗ = P m βmKm with βm ≥ 0 and P m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 16
  • 26. Combining kernels in an unsupervised setting Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 17
  • 27. Multiple kernel integration Ideas of kernel consensus: find a kernel that performs a consensus of all kernels [Mariette and Villa-Vialaneix, 2018] - R package mixKernel with consensus based on: ▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994] ▶ criterion that preserves local geometry Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 18
  • 28. Integrating TARA Oceans datasets ▶ For all compositional datasets, include phylogenetic information (rather than CLR and alike): weighted Unifrac distance ▶ Perform PCA (could have been clustering, linear model, . . . ) in the feature space. + combine with a shuffling approach to identify most influencing variables Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 19
  • 29. Application to TARA oceans Main facts ▶ Oceans typology related to longitude ▶ Rhizaria abundance structure the differences, especially between Arctic Oceans and Pacific Oceans Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 20
  • 30. Overview of the talk Kernel methods Integrating data with kernels Conclusion, perspectives Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 21
  • 31. Making kernel methods more interpretable Which features are important? (for numerical features only) [Brouard et al., 2022] and mixKernel ▶ extension to the unsupervised framework of the work [Allen, 2013, Grandvalet and Canu, 2002] ▶ also extension to the kernel output framework (time series, graph, ... outputs) Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 22
  • 32. Making integration methods more available to biologists http://asterics.miat.inrae.fr Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 23
  • 33. Toward joint signature extraction with NMF 1 2 2 X j=1 ∥X(j) − WH(j) ∥2 F | {z } goodness - of-fit + 2 X j=1 λ ∥H(j) ∥1 | {z } sparsity + µ 2 ∥W∥2 F | {z } regularization + γ 2 2 X j=1 ∥y − X(j) H(j) ⊤ β(j) ∥2 2 | {z } LDA (1) Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 24
  • 34. Future needs for data integration ▶ improve interpretability of methods (integrate more biological knowledge) ▶ reduce computational needs to achieve the challenge of a more sustainable research Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 25
  • 35. References (unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae Allen, G. I. (2013). Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299. Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1):lqac014. Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics, 32(12):i28–i36. Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau, C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S., Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S., Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019). Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biology, 17:108. Gönen, M. and Alpaydin, E. (2011). Multiple kernel learning algorithms. Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 25
  • 36. Journal of Machine Learning Research, 12:2211–2268. Grandvalet, Y. and Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages 569–576. MIT Press. Jaakkola, T., Diekhans, M., and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7(1-2):95–114. Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1):97–119. L’Hermier des Plantes, H. (1976). Structuration des tableaux à trois indices de la statistique. PhD thesis, Université de Montpellier. Thèse de troisième cycle. Mariette, J. and Villa-Vialaneix, N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015). Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 25
  • 37. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics, 16(2):85–97. Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004). Protein homology detection using string alignment kernels. Bioinformatics, 20(11):1682–1689. Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014). Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157–i64. Multi-omics data integration methods 2023-12-13, PEPR Santé Numérique / Nathalie Vialaneix p. 25