This document summarizes work on estimating mutual information for discrete and continuous data, and applying it to construct Chow-Liu forests. It then describes two experiments: one analyzing gene expression differences between breast cancer groups, finding one gene strongly connected to the class; another combining gene expression and SNPs, finding SNPs and genes interconnected in the causal network rather than separated.
Semelhante a Forest Learning based on the Chow-Liu Algorithm and its Application to Genome Differential Analysis: A Novel Mutual Information Estimation
Semelhante a Forest Learning based on the Chow-Liu Algorithm and its Application to Genome Differential Analysis: A Novel Mutual Information Estimation (20)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
Forest Learning based on the Chow-Liu Algorithm and its Application to Genome Differential Analysis: A Novel Mutual Information Estimation
1. Forest Learning based on the Chow-Liu Algorithm
and its Application to Genome Differential Analysis:
A Novel Mutual Information Estimation
Nov. 16-18, 2015
Joe Suzuki
(Osaka Univ.)
@joe_suzuki
Prof-joe
2. Road Map
• MI Estimation for Discrete (warming-up)
• MI Estimation for Discrete/Coninuous (propose)
• Experiment 1 (gene differential analysis)
• Experiment 2 (combination of SNP and gene)
• Concluding Remarks
3. How do you estimate MI given data ?
For discrete data, a naïve way is
4. MI Estimation based on MDL (Suzuki, UAI93)
P. Liang and N. Srebro 2004、K. Panayidou 2010、Edwards, et. al 2010 revisited the same
18. Experimet 1:
Genome expression profiling in breast cancer patients
• 58 sample with p53 mutation and 192 without it
• 1000 genes
Why only Bonferroni and FDR rather than causality and regression?
21. • 20 seconds for MI values and 30 seconds for a forest (1000 nodes)
The class variable has
only one connection
with gene variables
We conclude that
regression may be more
appropriate than
a graphical model.
22. Experiment 2:
300 gene expression (continue) and 300 SNP (3 values)
• Utah 90 residents SNP (HapMap) with northern and western European
ancestry
• R library (BioConductor) GGData
ftp://ftp.sanger.ac.uk/pub/genevar/CEU_parents_norm_march2007.zip
24. Causality among genes and SNP can be explored!!
Insights we obtain from the experiment:
• In the real causality,
SNP and genes are not separated as Edwards assumed !!
• Both SNPs and gene expressions are hubs of the mixed network.
variable cardinality
SNP 3 values
Gene expression continuous
25. Summary
• MI estimtaion
• Application to Chow-Liu
• Gene Differential Analysis
• Causality among SNPs and Gene Expressions
Future Works
Beyond Forests:
• BNs with bounded TW
• MNs not necessarily forests