SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Méthodes à noyaux pour l’intégration de données
hétérogènes
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
Séminaires du LaBRI, January 11th, 2024
Disclaimer
From the title, what you think the
presentation is about...
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 2
Disclaimer
From the title, what you think the
presentation is about...
... versus what it actually turns out to be:
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 2
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 3
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
[Foissac et al., 2019]
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 3
Omics data integration
Type of data to integrate
vertical:
horizontal:
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 4
Omics data integration
Type of data to integrate
vertical:
horizontal:
Type of analysis to perform
supervised:
unsupervised:
Left pictures courtesy Harold Duruflé
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 4
Multiple table analyses
(CCA, MFA, PLS, STATIS, ...)
Image from https://dimensionless.in
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 5
Types of data integration methods [Ritchie et al., 2015]
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 6
Want to know them all?
https://github.com/mikelove/awesome-multi-omics
Some specificities: can account for structure in data (network), are dedicated to a
specific omic (single-cell), can account for temporal/spatial information, can include
biological knowledge (mostly GO), ...
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 7
Scope of the rest of the talk
Unsupervised transformation based integration
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 8
Overview of the talk
Kernel methods
Integrating data with kernels
Constrained kernel hierarchical clustering and applications to Hi-C data
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 9
Main ideas behind kernel methods
Standard (omics) data analyses:
▶ data are (numeric) tables
▶ analyses are based on operations (distances, means, ...) in the variable space
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 10
Main ideas behind kernel methods
Kernel data analyses:
▶ data are arbitrary
▶ analyses are based on transformations of data to “similarities” between samples
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 11
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 12
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 12
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive (n × n)-matrix
K that measures a “similarity” between n entities in X
K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 12
Principles of learning from kernels
Start from any statistical method (PCA, regression, k-means clustering) and rewrite all
quantities using:
▶ K to compute distances and dot products in the feature space
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 13
Principles of learning from kernels
Start from any statistical method (PCA, regression, k-means clustering) and rewrite all
quantities using:
▶ K to compute distances and dot products in the feature space
▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved
elements (centers of gravity and so on...)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 13
In practice: kernel k-means example
Images taken from Mquantin’s work on WikiMedia Commons “Illustration du déroulement de
l’algorithme des k-means”
Standard: n points (xi )i in Rp
Kernel: n points (implicitly obtained from
a kernel K), (ϕ(xi ))i in H (also implicit)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Initialization:
Standard: randomly set K “class centers”
in Rp: (ck)k
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Initialization:
Standard: randomly set K “class centers”
in Rp: (ck)k
Kernel: (implicitely) randomly set K “class
centers” in H: (ck)k
ck :=
Pn
i=1 αki ϕ(xi ), with αki ≥ 0 and
P
i αki = 1
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Affectation step:
Standard: ∀ i = 1, . . . , n,
f (xi ) ← argmin
k=1,...,K
d(xi ; ck)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Affectation step:
Standard: ∀ i = 1, . . . , n,
f (xi ) ← argmin
k=1,...,K
d(xi ; ck)
Kernel: ∀ i = 1, . . . , n,
f (xi ) ← argmin
k=1,...,K
dH(ϕ(xi ); ck)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Affectation step:
Standard: ∀ i = 1, . . . , n,
f (xi ) ← argmin
k=1,...,K
d(xi ; ck)
Kernel: ∀ i = 1, . . . , n,
f (xi ) ← argmin
k=1,...,K
dH(ϕ(xi ); ck)
But:
d2
H(ϕ(xi ); ck) = ⟨ϕ(xi ) −
X
ℓ
αkℓϕ(xi′ ), ϕ(xi ) −
X
ℓ
αkℓϕ(xℓ)⟩H
= Kii − 2
X
ℓ
αkℓKiℓ +
X
ℓℓ′
αkℓKℓℓ′
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Representation step:
Standard: ∀ k = 1, . . . , K, update centers:
ck ←
1
♯{i : f (xi ) = k}
X
i: f (xi )=k
xi
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
In practice: kernel k-means example
Representation step:
Standard: ∀ k = 1, . . . , K, update centers:
ck ←
1
♯{i : f (xi ) = k}
X
i: f (xi )=k
xi
Kernel: ∀ k = 1, . . . , K, (implicitely)
update centers:
ck ←
X
i: f (xi )=k
1
♯{i : f (xi ) = k}
ϕ(xi )
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 14
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 15
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
2. nodes of a graph: [Kondor and Lafferty, 2002]
3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or
convolution kernel [Saigo et al., 2004])
4. kernel between graphs (used between metabolites based on their fragmentation
trees): [Shen et al., 2014, Brouard et al., 2016]
5. kernel embedding phylogeny information for metagenomics
[Mariette and Villa-Vialaneix, 2018]
6. ...
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 15
Also somehow a kernel... (but not positive)
Hi-C matrices (slide taken from SF’s work)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 16
Overview of the talk
Kernel methods
Integrating data with kernels
Constrained kernel hierarchical clustering and applications to Hi-C data
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 17
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 18
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 18
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
▶ supervised framework: K∗ =
P
m βmKm with βm ≥ 0 and
P
m βm = 1 with βm
chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011]
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 18
Combining kernels in an unsupervised setting
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 19
Multiple kernel integration
Ideas of kernel consensus: find a kernel that performs a consensus of all kernels
[Mariette and Villa-Vialaneix, 2018] - R package mixKernel
with consensus based on:
▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994]
▶ criterion that preserves local geometry
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 20
Integrating TARA Oceans datasets
▶ For all compositional datasets, include
phylogenetic information (rather than
CLR and alike): weighted Unifrac
distance
▶ Perform PCA (could have been
clustering, linear model, . . . ) in the
feature space.
+ combine with a shuffling approach
to identify most influencing variables
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 21
Application to TARA oceans
Main facts
▶ Oceans typology related to longitude
▶ Rhizaria abundance structure the differences, especially between Arctic Oceans
and Pacific Oceans
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 22
Overview of the talk
Kernel methods
Integrating data with kernels
Constrained kernel hierarchical clustering and applications to Hi-C data
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 23
Hi-C matrices and hierarchical clustering
Idea used in: [Fraser et al., 2015,
Soler-Vila et al., 2020, Zhang et al., 2021]
among others
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 24
Clustering based on Hi-C matrices
How is hierarchical clustering usu-
ally performed on Hi-C matrices?
1. a bin =
2. Euclidean distance + (constrained)
HC (+ post/pre-processing)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 25
Clustering based on Hi-C matrices
How is hierarchical clustering usu-
ally performed on Hi-C matrices?
1. a bin =
2. Euclidean distance + (constrained)
HC (+ post/pre-processing)
Why is it not satisfactory?
1. Hi-C matrices are already distances!
2. sparse vs full representations
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 25
Fast kernel constrained HC
[Ambroise et al., 2019] R package adjclust
▶ extend Ward’s constrained HC to kernel/similarity/arbitrary distance matrices
▶ fast version using a band computation (band sparsity assumption, min-heap, linear
in p)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 26
Fast kernel constrained HC
[Ambroise et al., 2019] R package adjclust
▶ extend Ward’s constrained HC to kernel/similarity/arbitrary distance matrices
▶ fast version using a band computation (band sparsity assumption, min-heap, linear
in p)
With Hi-C:
1. a bin = a bin
2. “distance” directly induced from Hi-C data (no transformation)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 26
R package adjclust
▶ fully compatible with HiTC format
▶ very flexible (and beautiful!) plots
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 27
Mathematical analysis
[Randriamihamison et al., 2021]
Mathematical analysis of the different cases:
▶ Euclidean case (including kernel)
▶ non Euclidean case (including arbitrary dissimilarity and Hi-C matrices)
with exhaustive analysis of crossovers and reversals
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 28
And numerical experiments
[Randriamihamison et al., 2021]
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 29
To be continued...
[Neuvial et al., 2023]
Keep listening! (teasing for next talk)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 30
Thank you for your attention!
Questions?
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 31
References
(unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau:
https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Allen, G. I. (2013).
Automatic feature selection via weighted kernels and regularization.
Journal of Computational and Graphical Statistics, 22(2):284–299.
Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., and Vialaneix, N. (2019).
Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.
Algorithms for Molecular Biology, 14:22.
Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022).
Feature selection for kernel methods in systems biology.
NAR Genomics and Bioinformatics, 4(1):lqac014.
Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016).
Fast metabolite identification with input output kernel regression.
Bioinformatics, 32(12):i28–i36.
Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau,
C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S.,
Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S.,
Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019).
Multi-species annotation of transcriptome and chromatin structure in domesticated animals.
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 31
BMC Biology, 17:108.
Fraser, J., Ferrai, C., Chiariello, A. M., Schueler, M., Rito, T., Laudanno, G., Barbieri, M., Moore, B. L., Kraemer, D. C., Aitken, S., Xie,
S. Q., Morris, K. J., Itoh, M., Kawaji, H., Jaeger, I., Hayashizaki, Y., Carninci, P., Forrest, A. R., The FANTOM Consortium, Semple, C. A.,
Dostie, J., Pombo, A., and Nicodemi, M. (2015).
Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation.
Molecular Systems Biology, 11:852.
Gönen, M. and Alpaydin, E. (2011).
Multiple kernel learning algorithms.
Journal of Machine Learning Research, 12:2211–2268.
Grandvalet, Y. and Canu, S. (2002).
Adaptive scaling for feature selection in SVMs.
In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages
569–576. MIT Press.
Jaakkola, T., Diekhans, M., and Haussler, D. (2000).
A discriminative framework for detecting remote protein homologies.
Journal of Computational Biology, 7(1-2):95–114.
Kondor, R. I. and Lafferty, J. (2002).
Diffusion kernels on graphs and other discrete structures.
In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney,
Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 31
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Thèse de troisième cycle.
Mariette, J. and Villa-Vialaneix, N. (2018).
Unsupervised multiple kernel learning for heterogeneous data integration.
Bioinformatics, 34(6):1009–1015.
Neuvial, P., Randriamihamison, N., Chavent, M., Foissac, S., and Vialaneix, N. (2023).
A two-sample tree-based test for hierarchically organized genomic signals.
Preprint submitted for publication. In revision.
Randriamihamison, N., Vialaneix, N., and Neuvial, P. (2021).
Applicability and interpretability of Ward’s hierarchical agglomerative clustering with or without contiguity constraints.
Journal of Classification, 38:363–389.
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015).
Methods of integrating data to uncover genotype-phenotype interactions.
Nature Reviews Genetics, 16(2):85–97.
Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004).
Protein homology detection using string alignment kernels.
Bioinformatics, 20(11):1682–1689.
Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014).
Metabolite identification through multiple kernel learning on fragmentation trees.
Bioinformatics, 30(12):i157–i64.
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 31
Soler-Vila, P., Cuscó, P., Farabella, I., Di Stefano, M., and Marti-Renon, M. A. (2020).
Hierarchical chromatin organization detected by TADpole.
Nucleic Acids Research, 45(7):e39.
Zhang, Y. W., Wang, M. B., and Li, S. C. (2021).
SuperTAD: robust detection of hierarchical topologically associated domains with optimized structural information.
Genome Biology, 22:45.
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 32
Making kernel methods more interpretable
Which features are important? (for numerical
features only)
[Brouard et al., 2022] and
mixKernel
▶ extension to the unsupervised framework of the
work [Allen, 2013, Grandvalet and Canu, 2002]
▶ also extension to the kernel output framework
(time series, graph, ... outputs)
Multi-omics data integration methods
2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix
p. 32

Mais conteúdo relacionado

Semelhante a Méthodes à noyaux pour l’intégration de données hétérogènes

Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
8. Recursion.pptx
8. Recursion.pptx8. Recursion.pptx
8. Recursion.pptxAbid523408
 
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans RVisualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptImXaib
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptRajeshT305412
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignDanilo Pianini
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdfEmanAsem4
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...University of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyMichiel Stock
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operatorjfrchicanog
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsHenry Schreiner
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 
A Semantic Web Platform for Improving the Automation and Reproducibility of F...
A Semantic Web Platform for Improving the Automation and Reproducibility of F...A Semantic Web Platform for Improving the Automation and Reproducibility of F...
A Semantic Web Platform for Improving the Automation and Reproducibility of F...Ratnesh Sahay
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theorytuxette
 

Semelhante a Méthodes à noyaux pour l’intégration de données hétérogènes (20)

Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
8. Recursion.pptx
8. Recursion.pptx8. Recursion.pptx
8. Recursion.pptx
 
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans RVisualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems Design
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational Biology
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNs
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
A Semantic Web Platform for Improving the Automation and Reproducibility of F...
A Semantic Web Platform for Improving the Automation and Reproducibility of F...A Semantic Web Platform for Improving the Automation and Reproducibility of F...
A Semantic Web Platform for Improving the Automation and Reproducibility of F...
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theory
 

Mais de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 

Mais de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 

Último

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Último (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Méthodes à noyaux pour l’intégration de données hétérogènes

  • 1. Méthodes à noyaux pour l’intégration de données hétérogènes Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu Séminaires du LaBRI, January 11th, 2024
  • 2. Disclaimer From the title, what you think the presentation is about... Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 2
  • 3. Disclaimer From the title, what you think the presentation is about... ... versus what it actually turns out to be: Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 2
  • 4. Collected data at genomic level are increasingly publicly available the different levels are not always compatible Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 3
  • 5. Collected data at genomic level are increasingly publicly available the different levels are not always compatible [Foissac et al., 2019] Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 3
  • 6. Omics data integration Type of data to integrate vertical: horizontal: Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 4
  • 7. Omics data integration Type of data to integrate vertical: horizontal: Type of analysis to perform supervised: unsupervised: Left pictures courtesy Harold Duruflé Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 4
  • 8. Multiple table analyses (CCA, MFA, PLS, STATIS, ...) Image from https://dimensionless.in Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 5
  • 9. Types of data integration methods [Ritchie et al., 2015] Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 6
  • 10. Want to know them all? https://github.com/mikelove/awesome-multi-omics Some specificities: can account for structure in data (network), are dedicated to a specific omic (single-cell), can account for temporal/spatial information, can include biological knowledge (mostly GO), ... Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 7
  • 11. Scope of the rest of the talk Unsupervised transformation based integration Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 8
  • 12. Overview of the talk Kernel methods Integrating data with kernels Constrained kernel hierarchical clustering and applications to Hi-C data Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 9
  • 13. Main ideas behind kernel methods Standard (omics) data analyses: ▶ data are (numeric) tables ▶ analyses are based on operations (distances, means, ...) in the variable space Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 10
  • 14. Main ideas behind kernel methods Kernel data analyses: ▶ data are arbitrary ▶ analyses are based on transformations of data to “similarities” between samples Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 11
  • 15. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 12
  • 16. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 12
  • 17. More formally... n samples (xi )i ∈ X kernels: symmetric and positive (n × n)-matrix K that measures a “similarity” between n entities in X K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩ Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 12
  • 18. Principles of learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: ▶ K to compute distances and dot products in the feature space Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 13
  • 19. Principles of learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: ▶ K to compute distances and dot products in the feature space ▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved elements (centers of gravity and so on...) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 13
  • 20. In practice: kernel k-means example Images taken from Mquantin’s work on WikiMedia Commons “Illustration du déroulement de l’algorithme des k-means” Standard: n points (xi )i in Rp Kernel: n points (implicitly obtained from a kernel K), (ϕ(xi ))i in H (also implicit) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 21. In practice: kernel k-means example Initialization: Standard: randomly set K “class centers” in Rp: (ck)k Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 22. In practice: kernel k-means example Initialization: Standard: randomly set K “class centers” in Rp: (ck)k Kernel: (implicitely) randomly set K “class centers” in H: (ck)k ck := Pn i=1 αki ϕ(xi ), with αki ≥ 0 and P i αki = 1 Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 23. In practice: kernel k-means example Affectation step: Standard: ∀ i = 1, . . . , n, f (xi ) ← argmin k=1,...,K d(xi ; ck) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 24. In practice: kernel k-means example Affectation step: Standard: ∀ i = 1, . . . , n, f (xi ) ← argmin k=1,...,K d(xi ; ck) Kernel: ∀ i = 1, . . . , n, f (xi ) ← argmin k=1,...,K dH(ϕ(xi ); ck) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 25. In practice: kernel k-means example Affectation step: Standard: ∀ i = 1, . . . , n, f (xi ) ← argmin k=1,...,K d(xi ; ck) Kernel: ∀ i = 1, . . . , n, f (xi ) ← argmin k=1,...,K dH(ϕ(xi ); ck) But: d2 H(ϕ(xi ); ck) = ⟨ϕ(xi ) − X ℓ αkℓϕ(xi′ ), ϕ(xi ) − X ℓ αkℓϕ(xℓ)⟩H = Kii − 2 X ℓ αkℓKiℓ + X ℓℓ′ αkℓKℓℓ′ Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 26. In practice: kernel k-means example Representation step: Standard: ∀ k = 1, . . . , K, update centers: ck ← 1 ♯{i : f (xi ) = k} X i: f (xi )=k xi Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 27. In practice: kernel k-means example Representation step: Standard: ∀ k = 1, . . . , K, update centers: ck ← 1 ♯{i : f (xi ) = k} X i: f (xi )=k xi Kernel: ∀ k = 1, . . . , K, (implicitely) update centers: ck ← X i: f (xi )=k 1 ♯{i : f (xi ) = k} ϕ(xi ) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 14
  • 28. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 15
  • 29. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 2. nodes of a graph: [Kondor and Lafferty, 2002] 3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or convolution kernel [Saigo et al., 2004]) 4. kernel between graphs (used between metabolites based on their fragmentation trees): [Shen et al., 2014, Brouard et al., 2016] 5. kernel embedding phylogeny information for metagenomics [Mariette and Villa-Vialaneix, 2018] 6. ... Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 15
  • 30. Also somehow a kernel... (but not positive) Hi-C matrices (slide taken from SF’s work) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 16
  • 31. Overview of the talk Kernel methods Integrating data with kernels Constrained kernel hierarchical clustering and applications to Hi-C data Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 17
  • 32. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 18
  • 33. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 18
  • 34. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km ▶ supervised framework: K∗ = P m βmKm with βm ≥ 0 and P m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 18
  • 35. Combining kernels in an unsupervised setting Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 19
  • 36. Multiple kernel integration Ideas of kernel consensus: find a kernel that performs a consensus of all kernels [Mariette and Villa-Vialaneix, 2018] - R package mixKernel with consensus based on: ▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994] ▶ criterion that preserves local geometry Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 20
  • 37. Integrating TARA Oceans datasets ▶ For all compositional datasets, include phylogenetic information (rather than CLR and alike): weighted Unifrac distance ▶ Perform PCA (could have been clustering, linear model, . . . ) in the feature space. + combine with a shuffling approach to identify most influencing variables Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 21
  • 38. Application to TARA oceans Main facts ▶ Oceans typology related to longitude ▶ Rhizaria abundance structure the differences, especially between Arctic Oceans and Pacific Oceans Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 22
  • 39. Overview of the talk Kernel methods Integrating data with kernels Constrained kernel hierarchical clustering and applications to Hi-C data Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 23
  • 40. Hi-C matrices and hierarchical clustering Idea used in: [Fraser et al., 2015, Soler-Vila et al., 2020, Zhang et al., 2021] among others Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 24
  • 41. Clustering based on Hi-C matrices How is hierarchical clustering usu- ally performed on Hi-C matrices? 1. a bin = 2. Euclidean distance + (constrained) HC (+ post/pre-processing) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 25
  • 42. Clustering based on Hi-C matrices How is hierarchical clustering usu- ally performed on Hi-C matrices? 1. a bin = 2. Euclidean distance + (constrained) HC (+ post/pre-processing) Why is it not satisfactory? 1. Hi-C matrices are already distances! 2. sparse vs full representations Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 25
  • 43. Fast kernel constrained HC [Ambroise et al., 2019] R package adjclust ▶ extend Ward’s constrained HC to kernel/similarity/arbitrary distance matrices ▶ fast version using a band computation (band sparsity assumption, min-heap, linear in p) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 26
  • 44. Fast kernel constrained HC [Ambroise et al., 2019] R package adjclust ▶ extend Ward’s constrained HC to kernel/similarity/arbitrary distance matrices ▶ fast version using a band computation (band sparsity assumption, min-heap, linear in p) With Hi-C: 1. a bin = a bin 2. “distance” directly induced from Hi-C data (no transformation) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 26
  • 45. R package adjclust ▶ fully compatible with HiTC format ▶ very flexible (and beautiful!) plots Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 27
  • 46. Mathematical analysis [Randriamihamison et al., 2021] Mathematical analysis of the different cases: ▶ Euclidean case (including kernel) ▶ non Euclidean case (including arbitrary dissimilarity and Hi-C matrices) with exhaustive analysis of crossovers and reversals Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 28
  • 47. And numerical experiments [Randriamihamison et al., 2021] Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 29
  • 48. To be continued... [Neuvial et al., 2023] Keep listening! (teasing for next talk) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 30
  • 49. Thank you for your attention! Questions? Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 31
  • 50. References (unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae Allen, G. I. (2013). Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299. Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., and Vialaneix, N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14:22. Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1):lqac014. Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics, 32(12):i28–i36. Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau, C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S., Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S., Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019). Multi-species annotation of transcriptome and chromatin structure in domesticated animals. Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 31
  • 51. BMC Biology, 17:108. Fraser, J., Ferrai, C., Chiariello, A. M., Schueler, M., Rito, T., Laudanno, G., Barbieri, M., Moore, B. L., Kraemer, D. C., Aitken, S., Xie, S. Q., Morris, K. J., Itoh, M., Kawaji, H., Jaeger, I., Hayashizaki, Y., Carninci, P., Forrest, A. R., The FANTOM Consortium, Semple, C. A., Dostie, J., Pombo, A., and Nicodemi, M. (2015). Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Molecular Systems Biology, 11:852. Gönen, M. and Alpaydin, E. (2011). Multiple kernel learning algorithms. Journal of Machine Learning Research, 12:2211–2268. Grandvalet, Y. and Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages 569–576. MIT Press. Jaakkola, T., Diekhans, M., and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7(1-2):95–114. Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1):97–119. Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 31
  • 52. L’Hermier des Plantes, H. (1976). Structuration des tableaux à trois indices de la statistique. PhD thesis, Université de Montpellier. Thèse de troisième cycle. Mariette, J. and Villa-Vialaneix, N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015. Neuvial, P., Randriamihamison, N., Chavent, M., Foissac, S., and Vialaneix, N. (2023). A two-sample tree-based test for hierarchically organized genomic signals. Preprint submitted for publication. In revision. Randriamihamison, N., Vialaneix, N., and Neuvial, P. (2021). Applicability and interpretability of Ward’s hierarchical agglomerative clustering with or without contiguity constraints. Journal of Classification, 38:363–389. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015). Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics, 16(2):85–97. Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004). Protein homology detection using string alignment kernels. Bioinformatics, 20(11):1682–1689. Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014). Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157–i64. Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 31
  • 53. Soler-Vila, P., Cuscó, P., Farabella, I., Di Stefano, M., and Marti-Renon, M. A. (2020). Hierarchical chromatin organization detected by TADpole. Nucleic Acids Research, 45(7):e39. Zhang, Y. W., Wang, M. B., and Li, S. C. (2021). SuperTAD: robust detection of hierarchical topologically associated domains with optimized structural information. Genome Biology, 22:45. Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 32
  • 54. Making kernel methods more interpretable Which features are important? (for numerical features only) [Brouard et al., 2022] and mixKernel ▶ extension to the unsupervised framework of the work [Allen, 2013, Grandvalet and Canu, 2002] ▶ also extension to the kernel output framework (time series, graph, ... outputs) Multi-omics data integration methods 2024-01-11, Séminaires du LaBRI / Nathalie Vialaneix p. 32