Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Network-based machine learning approach for aggregating multi-modal data
1. Network-based machine learning approach
for aggregating multi-modal data
So Yeon Kim
December 12, 2019
Advisor: Professor Kyung-Ah Sohn
1
Ph.D. Dissertation Defense
3. Integrating multi-modal data
• How to effectively integrate multi-modal data?
• We can aggregate multi-modal data to better represent the data
• Aggregate similar data into clusters
• Transform them into a high-level feature matrix
* Images from Wikipedia, http://blog.voicebase.com/big-data-aggregation3
Introduction
4. Key challenges
• Data heterogeneity
• Noisy data can be included in some views
• Inconsistent results (correlated data in one view may not stay
correlated under other views)
• Complex inter- and intra-relationships between data in multiple views
• Developing a model that is robust to noise and effectively handles
complexity and heterogeneity is the key!
4
Introduction
5. Network-based integrative approach
• Complex and heterogeneous information can be utilized in a network
* Image from twitter, OmicsNet (https://www.omicsnet.ca/)5
Introduction
6. Network clustering
• Goal: Given the similarity graph of affinity matrix W from the data, we
want to find a partition in the graph which corresponds to k clusters
6
Background
Social-tag network Image network Tagged-image network
Affinity matrix W Similarity graph 𝑮(𝑽, 𝑬, 𝑾)Data points
undirected weighted graph
𝒊
𝒋
𝑾 𝒊, 𝒋 = 𝒆𝒙𝒑 −
𝝆 𝒗𝒊, 𝒗𝒋
𝟐𝜶 𝟐
Spectral clustering
(graph partitioning algorithm)
A
B
𝒊
𝒋
0.1
0.1 3
3
2
2
2
4
7. Network-based pathway activity inference
• Genomic data can be represented in a network using pathway knowledge
Gene
Samples
𝑔1
𝑔2
𝑔3
…
𝑔 𝑘
𝑔1
𝑔2
𝑔3
𝑷 𝟏
Pathway 𝑷 𝟏
𝑷 𝟐
𝑷 𝟑
…
7
Background
8. Network-based pathway activity inference
• Pathway activity inference
• It transforms single genomic profile into pathway profile using
activity scoring measure
• Early methods simply summarized expression values of genes
within pathways
• e.g. PLAGE (Tomfor et al. Bioinformatics 2005), CORG (Lee et al. Plos Comput Biol
2008)
• Network-based pathway activity inference
• DART estimates relevance network topology-based pathway
activities (Jiao et al. Bioinformatics, 2011)
• DRW infers pathway activities using directed random walk-
based method (Liu et al. Bioinformatics, 2013)
8
Background
Pathways
Samples
𝑃1
𝑃2
𝑃3
…
𝑃𝑛
Gene
Samples
𝑔1
𝑔2
𝑔3
…
𝑔 𝑘
Pathway-based
gene-gene graph
9. Thesis goal
• Develop a general and flexible network-based integrative model to
effectively aggregate multi-modal data
• Focus on the algorithm that is robust to noise and is able to handle
complexity and data heterogeneity
• Facilitate an integrative analysis based on the multi-modal network
9
Thesis goal
16. Multi-view network analysis
• Robust to the noise when combining different types of network
Social-tag network clustering
Image network clustering
Multi-view network-based tagged-image clustering
“Wurzburg Residence Germany”Social-tag network Image network
Tagged-image
network
16
Multi-view network clustering
17. Discussion
• It is effective in capturing the structure of the multiple types of
networks for a challenging clustering problem in social media data
• Do not need to know the exact network structure of each view
• It is robust to noise when combining different types of data
• SNF assumes that network for each view shared same nodes, but
different representation
• Not scalable to large network
17
Multi-view network clustering
18. Multi-layered network based
pathway activity inference
BMC medical genomics (2018); Biology direct (2019); Bioinformatics (In preparation)
18
Multi-layered network based pathway activity inference
20. Integrative directed random walk-based pathway activity
inference (iDRW) on multi-layered network
Gene
Samples
GeneGene
𝑮 𝟏
𝑮 𝟐
𝑮 𝟑
Pathways
Samples
𝑃1
𝑃2
𝑃3
…
𝑃𝑛
𝑊0 = −𝑙𝑜𝑔(𝑝 𝑣 + 𝜖)
𝑊𝑡+1 = 1 − 𝑟 𝑀 𝑇
𝑊𝑡 + 𝑟𝑊0
Random Walk with Restart (RWR) on directed graph
Random walker
𝑃𝑖 =
σ 𝑘=1
𝑛𝑖
𝑊 𝑣 𝑘 × 𝑠𝑔𝑛 𝑧 𝑘 × 𝑥 𝑘
σ 𝑘=1
𝑛𝑖
𝑊 𝑣 𝑘
2
𝒗 𝒌
For 𝑛𝑖 significant genes within pathway,
𝑝 𝑣 < 0.05
20
Multi-layered network based pathway activity inference
𝒙 𝒌
21. Experiment I
• Investigate the causal relationships between gene expression and
DNA methylation on the pathway-based gene-gene graph in breast
cancer data
21
Multi-layered network based pathway activity inference
24. Experiment II
• Investigate the effectiveness of iDRW considering the interactions
between gene expression and copy number variation in breast cancer
and neuroblastoma datasets
24
Multi-layered network based pathway activity inference
25. Breast cancer Neuroblastoma
iDRW improved survival group
classification performance
than benchmark methods in
both cancer datasets
25
Multi-layered network based pathway activity inference
26. iDRW showed robust predictive power to the number of
pathway features (k) and samples (n)
26
Multi-layered network based pathway activity inference
32. iDRW facilitates the integrative gene-gene network
analysis
32
Multi-layered network based pathway activity inference
33. Discussion
• Multi-layered gene-gene network based pathway activity inference to
transform multiple genomic profiles into a single pathway profile
• Showed the effectiveness of iDRW on various experimental settings
for several types of cancer data
• Contribute to an improved outcome prediction performance
• Jointly identify cancer-associated pathways and genes
• Facilitate integrative network analysis on the multi-omics network
33
Multi-layered network based pathway activity inference
34. Conclusion
▪ Network-based integrative approaches for aggregating multi-modal data
• Multi-view network clustering
• Multi-layered network-based pathway activity inference
▪ Thesis contributions
• Effectively aggregates heterogeneous information by utilizing the interactions
between different modalities of data based on the network
• Facilitate the integrated network analysis as they represent multi-modal data on the
integrated network
• Generally applicable to any numbers and types of data in various domains
34
Conclusion
35. Conclusion
▪ Future directions
• Can be applied to the multi-modal data which are represented by the network
in other domain
• The hybrid approach which aggregates multi-modal data into clusters and
generates a new input matrix which uses clusters as features
• Multi-modal data network which is scalable to larger network and considers
different types of modalities
35
Conclusion
36. Publications
• “Multi-view network-based social-tagged landmark image clustering”
So Yeon Kim and Kyung-Ah Sohn. In proceedings of ICIP 2017
• “Integrative Pathway based Survival Prediction utilizing Interaction between Gene Expression and
DNA Methylation in Breast Cancer” So Yeon Kim, Tae Rim Kim, Hyun-hwan Jeong, Kyung-Ah Sohn.
BMC Medical Genomics 2018 (presented at TBC 2017)
• “Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival
Prediction in Multiple Cancer Studies” So Yeon Kim, Hyun-Hwan Jeong, Jaesik Kim, Jeong-Hyeon
Moon, Kyung-Ah Sohn. Biology Direct 2019 (presented at CAMDA 2018 – ISMB COSI track)
• “iDRW: integrative directed random walks on multi-layered gene-gene graph to infer pathway
activity for outcome prediction in urologic cancer” So Yeon Kim, Eun Kyung Choe, Manu
Shivakumar, Dokyoon Kim, Kyung-Ah Sohn. Bioinformatics (in preparation) (presented at MABC
2019 / ASHG 2019)
Conclusion
36
38. Multi-view network analysis
Abbey of Saint Gall
Louvre pyramid, Paris
Canadian National Vimy Memorial, France
George Washington
Birthplace, Virginia
Hosios Loukas
Monastery, Greece
Casas Grandes Chihuahua,
Mexico
Tagged-image network
38
Multi-view network clustering
39. Incorporating pathway
information showed better
survival group classification
performance
meanAUC
meanAccuracy(%)
39
Multi-layered network based pathway activity inference
40. iDRW shows distinctive pathway activity patterns across
cancers
40
Multi-layered network based pathway activity inference