SlideShare a Scribd company logo
1 of 43
Download to read offline
Machine LearningMachine Learninggg
Graphical ModelsGraphical Models
Eric XingEric Xing
Lecture 10, August 14, 2010
Receptor A Receptor BX1 X2
Receptor A Receptor BX1 X2X1 X2
Eric Xing © Eric Xing @ CMU, 2006-2010 1
ectu e 0, ugust , 0 0
Reading:
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
X3 X4 X5
X6
X7 X8
Multivariate Distribution in High-
D SpaceD Space
 A possible world for cellular signal transduction:
Receptor A Receptor BX1 X2
p g
Ki C Ki EKi DX X XKinase C
TF F
Kinase EKinase DX3 X4 X5
TF F
X6
Eric Xing © Eric Xing @ CMU, 2006-2010 2
Gene G Gene HX7 X8
Recap of Basic Prob Concepts
 Representation: what is the joint probability dist. on multiple
Recap of Basic Prob. Concepts
p j p y p
variables?
H t t fi ti i t t l? 28
),,,,,,,,( 87654321 XXXXXXXXP
A BA BA BA B
 How many state configurations in total? --- 28
 Are they all needed to be represented?
 Do we get any scientific/medical insight?
C
F
G H
EDC
F
G H
EDC
F
G H
EDC
F
G H
ED
 Learning: where do we get all this probabilities?
 Maximal-likelihood estimation? but how many data do we need?
 Where do we put domain knowledge in terms of plausible relationships between variables, and
plausible values of the probabilities?plausible values of the probabilities?
 Inference: If not all variables are observable, how to compute the
conditional distribution of latent variables given evidence?
Eric Xing © Eric Xing @ CMU, 2006-2010 3
g
 Computing p(H|A) would require summing over all 26 configurations of the
unobserved variables
What is a Graphical Model?
l f i l t d ti th--- example from a signal transduction pathway
 A possible world for cellular signal transduction:
Receptor A Receptor BX1 X2
p g
Ki C Ki EKi DX X XKinase C
TF F
Kinase EKinase DX3 X4 X5
TF F
X6
Eric Xing © Eric Xing @ CMU, 2006-2010 4
Gene G Gene HX7 X8
GM: Structure Simplifies
RepresentationRepresentation
 Dependencies among variables
Receptor A Receptor BX1 X2
p g
Ki C Ki EKi D
Membrane
X X XKinase C
TF F
Kinase EKinase D
Cytosol
X3 X4 X5
TF F
X6
Eric Xing © Eric Xing @ CMU, 2006-2010 5
Gene G Gene H
Nucleus
X7 X8
Probabilistic Graphical Models,
con'd
 If Xi's are conditionally independent (as described by a PGM), the
j i t b f t d t d t f i l t
con d
joint can be factored to a product of simpler terms, e.g.,
P(X1, X2, X3, X4, X5, X6, X7, X8)
Receptor A Receptor BX1 X2
Receptor A Receptor BX1 X2X1 X2
( 1 2 3 4 5 6 7 8)
= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)
P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
X3 X4 X5
X6
X7 X8
 Why we may favor a PGM?
 Representation cost: how many probability statements are needed?
888
 Algorithms for systematic and efficient inference/learning computation
• Exploring the graph structure and probabilistic (e.g., Bayesian, Markovian) semantics
2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28!
Eric Xing © Eric Xing @ CMU, 2006-2010 6
 Incorporation of domain knowledge and causal (logical) structures
Specification of a BNSpecification of a BN
 There are two components to any GM:p y
 the qualitative specification
 the quantitative specification
A
C ED
BA
C ED
BA
C ED
B
C
F
G H
EDC
F
G H
EDC
F
G H
ED
0.9 0.1
c 0.2 0.8
dc
d
d
DC P(F | C,D)
0.9 0.1
c 0.2 0.8
dc
d
d
DC P(F | C,D)
G HG HG H
d
c
0.01 0.99
0.9 0.1d
c d
c
0.01 0.99
0.9 0.1d
c
Eric Xing © Eric Xing @ CMU, 2006-2010 7
Qualitative SpecificationQualitative Specification
 Where does the qualitative specification come from?q p
 Prior knowledge of causal relationships
 Prior knowledge of modular relationships
 Assessment from experts
 Learning from data
 We simply link a certain architecture (e.g. a layered graph)
 …
Eric Xing © Eric Xing @ CMU, 2006-2010 8
Two types of GMs
 Directed edges give causality relationships (Bayesian
Two types of GMs
g g y p ( y
Network or Directed Graphical Model):
Receptor A Receptor BX1 X2
Receptor A Receptor BX1 X2X1 X2
P(X1, X2, X3, X4, X5, X6, X7, X8)
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
Kinase C
TF F
Gene G Gene H
Kinase EKinase DX3 X4 X5
X6
X7 X8
X3 X4 X5
X6
X7 X8
P(X1, X2, X3, X4, X5, X6, X7, X8)
= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)
P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)
 Undirected edges simply give correlations between variables
(Markov Random Field or Undirected Graphical model):
Receptor A
Kinase C
TF F
Kinase EKinase D
Receptor BX1 X2
X3 X4 X5
Receptor A
Kinase C
TF F
Kinase EKinase D
Receptor BX1 X2
X3 X4 X5
X1 X2
X3 X4 X5
P(X1, X2, X3, X4, X5, X6, X7, X8)
= 1/Z exp{E(X1)+E(X2)+E(X3, X1)+E(X4, X2)+E(X5, X2)
Eric Xing © Eric Xing @ CMU, 2006-2010 9
TF F
Gene G Gene H
X6
X7 X8
TF F
Gene G Gene H
X6
X7 X8
X6
X7 X8
p{ ( 1) ( 2) ( 3 1) ( 4 2) ( 5 2)
+ E(X6, X3, X4)+E(X7, X6)+E(X8, X5, X6)}
Bayesian Network:Bayesian Network:
 A BN is a directed graph whose nodes represent the random
variables and whose edges represent direct influence of one
variable on another.
 It is a data structure that provides the skeleton for representing a
joint distribution compactly in a factorized way;
 It offers a compact representation for a set of conditional
independence assumptions about a distribution;
 We can view the graph as encoding a generative sampling process
executed by nature, where the value for each variable is selected by
Eric Xing © Eric Xing @ CMU, 2006-2010 10
nature using a distribution that depends only on its parents. In other
words, each variable is a stochastic function of its parents.
Bayesian Network: Factorization TheoremBayesian Network: Factorization Theorem
 Theorem:
Given a DAG, The most general form of the probability
distribution that is consistent with the graph factors according
to “node given its parents”:to node given its parents :
where is the set of parents of X d is the number of nodes


di
i i
XPP
:
)|()(
1
XX
Xwhere is the set of parents of Xi, d is the number of nodes
(variables) in the graph.
iX
P(X1, X2, X3, X4, X5, X6, X7, X8)
= P(X ) P(X ) P(X | X ) P(X | X ) P(X | X )
Receptor A
Kinase C
TF F
Kinase EKinase D
Receptor BX1 X2
X3 X4 X5
Receptor A
Kinase C
TF F
Kinase EKinase D
Receptor BX1 X2
X3 X4 X5
X1 X2
X3 X4 X5
Eric Xing © Eric Xing @ CMU, 2006-2010 11
= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)
P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)
TF F
Gene G Gene H
X6
X7 X8
TF F
Gene G Gene H
X6
X7 X8
X6
X7 X8
Bayesian Network: Conditional
Independence Semantics
Structure: DAG Ancestor
Independence Semantics
• Meaning: a node is
conditionally independent
Ancestor
Parent
of every other node in the
network outside its Markov
blanket
Y1 Y2
Parent
• Local conditional distributions
(CPD) and the DAG
completely determine the
X
completely determine the
joint dist.
• Give causality relationships
Children's co-parentChildren's co-parent
Child
Eric Xing © Eric Xing @ CMU, 2006-2010 12
Give causality relationships,
and facilitate a generative
process
Descendent
Local Structures &
Independencies
B
Independencies
 Common parent
A C
p
 Fixing B decouples A and C
"given the level of gene B, the levels of A and C are independent"
A CB
 Cascade
 Knowing B decouples A and C
"given the level of gene B, the level gene A provides no
t di ti l f th l l f C"
A B
extra prediction value for the level of gene C"
 V-structure
 Knowing C couples A and B
C
 Knowing C couples A and B
because A can "explain away" B w.r.t. C
"If A correlates to C, then chance for B to also correlate to B will decrease"
Eric Xing © Eric Xing @ CMU, 2006-2010 13
 The language is compact, the concepts are rich!
A simple justificationA simple justification
B
A C
Eric Xing © Eric Xing @ CMU, 2006-2010 14
Graph separation criterionGraph separation criterion
 D-separation criterion for Bayesian networks (D for Directedp y (
edges):
D fi iti i bl d D t d ( diti llDefinition: variables x and y are D-separated (conditionally
independent) given z if they are separated in the moralized
ancestral graph
 Example:
Eric Xing © Eric Xing @ CMU, 2006-2010 15
Global Markov properties of
DAGsDAGs
 X is d-separated (directed-separated) from Z given Y if we can't
send a ball from any node in X to any node in Z using the "Bayes-
ball" algorithm illustrated bellow (and plus some boundary
conditions):
• Defn: I(G)all independence
properties that correspond to d-p p p
separation:
 
• D-separation is sound and complete
 );(dsep:)(I YZXYZXG G
Eric Xing © Eric Xing @ CMU, 2006-2010 16
p p
Example:Example:
 Complete the I(G) of this
x4 graph:
x1
x4
x1
x3
x2
x3
2
Eric Xing © Eric Xing @ CMU, 2006-2010 17
Towards quantitative specification of
probability distributionprobability distribution
 Separation properties in the graph imply independencep p p g p p y p
properties about the associated variables
 For the graph to be useful, any conditional independence
properties we can derive from the graph should hold for theproperties we can derive from the graph should hold for the
probability distribution that the graph represents
 The Equivalence Theorem
For a graph G,
L t D d t th f il f ll di t ib ti th t ti f I(G)Let D1 denote the family of all distributions that satisfy I(G),
Let D2 denote the family of all distributions that factor according to G,
Then D1≡D2.
Eric Xing © Eric Xing @ CMU, 2006-2010 18
1 2
ExampleExample
 Speech recognitionp g
A AA AX2 X3X1 XT
Y2 Y3Y1 YT...
A AA AX2 X3X1 XT...
Hidd M k M d lHidden Markov Model
Eric Xing © Eric Xing @ CMU, 2006-2010 19
Knowledge EngineeringKnowledge Engineering
 Picking variablesg
 Observed
 Hidden
 Picking structure
 CAUSAL
 Generative Generative
 Picking Probabilities
Z b biliti Zero probabilities
 Orders of magnitudes
 Relative values
Eric Xing © Eric Xing @ CMU, 2006-2010 20
Example con'dExample, con d
 Evolution
ancestor
Qh
Qm
T years
?
A C
Qh m
A
G
A
G
A
C
A C
Tree Model
Eric Xing © Eric Xing @ CMU, 2006-2010 21
Example con'dExample, con d
 Genetic Pedigree
A0
A1
Ag
B0
B1
Bg
g
F0
M
0
M
1
F1
Fg
Sg
C
0C
Eric Xing © Eric Xing @ CMU, 2006-2010 22
C
1
g
Conditional probability tables
(CPTs)
a0 0 75 b0 0 33
P( b d)
(CPTs)
a0 0.75
a1 0.25
b0 0.33
b1 0.67
P(a,b,c.d) =
P(a)P(b)P(c|a,b)P(d|c)
a0b0 a0b1 a1b0 a1b1
A B
a b a b a b a b
c0 0.45 1 0.9 0.7
c1 0.55 0 0.1 0.3
C
D
c0 c1
d0 0 3 0 5
Eric Xing © Eric Xing @ CMU, 2006-2010 23
D d 0.3 0.5
d1 07 0.5
Conditional probability density
func (CPDs)
P( b d)
func. (CPDs)
P(a,b,c.d) =
P(a)P(b)P(c|a,b)P(d|c)A~N(μa, Σa) B~N(μb, Σb)
A B
C C~N(A+B, Σc)
C)
D D~N(μ +C Σ ) C
P(D|
Eric Xing © Eric Xing @ CMU, 2006-2010 24
D D~N(μa+C, Σa)
D
C
Conditionally Independent
ObservationsObservations
 Model parameters
y1 Datay2
yn-1 yn
Eric Xing © Eric Xing @ CMU, 2006-2010 25
“Plate” NotationPlate Notation
 Model parameters
yi
Data = {y1,…yn}
i=1:n
Plate = rectangle in graphical model
variables within a plate are replicated
Eric Xing © Eric Xing @ CMU, 2006-2010 26
in a conditionally independent manner
Example: Gaussian ModelExample: Gaussian Model
 G ti d l Generative model:
p(y1,…yn | , ) = P p(yi | , )

yi
i=1:n
= p(data | parameters)
= p(D | )
h  { }i=1:n where  = {, }
 Likelihood = p(data | parameters)
= p( D |  )= p( D |  )
= L ()
 Likelihood tells us how likely the observed data are conditioned on a
Eric Xing © Eric Xing @ CMU, 2006-2010 27
particular setting of the parameters
 Often easier to work with log L ()
Example: Bayesian Gaussian
ModelModel
  
yyi
i=1:n
Note: priors and parameters are assumed independent here
Eric Xing © Eric Xing @ CMU, 2006-2010 28
Markov Random Fields
Structure: an undirected
Markov Random Fields
graph
• Meaning: a node isg
conditionally independent of
every other node in the network
given its Directed neighbors
Y1 Y2
• Local contingency functions
(potentials) and the cliques in
the graph completely determine
X
the joint dist.
• Give correlations between
variables but no explicit way to
Eric Xing © Eric Xing @ CMU, 2006-2010 29
variables, but no explicit way to
generate samples
Global Markov propertyGlobal Markov property
 Let H be an undirected graph:g p
 B separates A and C if every path from a node in A to a node B separates A and C if every path from a node in A to a node
in C passes through a node in B:
 A probability distribution satisfies the global Markov property
f f C C
);(sep BCAH
Eric Xing © Eric Xing @ CMU, 2006-2010 30
if for any disjoint A, B, C, such that B separates A and C, A is
independent of C given B:  );(sep:))(I BCABCAH H
Soundness and completeness of
global Markov propertyglobal Markov property
 Defn: An UG H is an I-map for a distribution P if I(H)  I(P),p ( ) ( )
i.e., P entails I(H).
 Defn: P is a Gibbs distribution over H if it can be represented
asas


Cc
ccn
Z
xxP )(),,( x
1
1 
 Thm (soundness): If P is a Gibbs distribution over H, then H
is an I-map of P.
 Thm (completeness): If sepH(X; Z |Y), then X P Z |Y in
some P that factorizes over H.
Eric Xing © Eric Xing @ CMU, 2006-2010 31
so e t at acto es o e
RepresentationRepresentation
 Defn: an undirected graphical model represents a distributiong p p
P(X1 ,…,Xn) defined by an undirected graph H, and a set of
positive potential functions yc associated with cliques of H,
s.t.
1
where Z is known as the partition function:


Cc
ccn
Z
xxP )(),,( x
1
1 
 

nxx Cc
ccZ
,,
)(
1
x
 Also known as Markov Random Fields, Markov networks …
 The potential function can be understood as an contingency
function of its arguments assigning "pre-probabilistic" score of
Eric Xing © Eric Xing @ CMU, 2006-2010 32
function of its arguments assigning pre probabilistic score of
their joint configuration.
CliquesCliques
 For G={V,E}, a complete subgraph (clique) is a subgraph{ } p g p ( q ) g p
G'={V'ÍV,E'ÍE} such that nodes in V' are fully interconnected
 A (maximal) clique is a complete subgraph s.t. any superset
V"ÉV' is not completeV ÉV is not complete.
 A sub-clique is a not-necessarily-maximal clique.
A
DD BB
 Example:
max cliques = {A B D} {B C D}
CC
Eric Xing © Eric Xing @ CMU, 2006-2010 33
 max-cliques = {A,B,D}, {B,C,D},
 sub-cliques = {A,B}, {C,D}, … all edges and singletons
Example UGM – using max
cliquescliques
AA
DD BB
CC
)()(),,,( 2341244321
1
xxxxxxP   )()(),,,( 2341244321 xx cc
Z
xxxxP  
 
4321
234124
xxxx
ccZ
,,,
)()( xx 
 For discrete nodes, we can represent P(X1:4) as two 3D tables
instead of one 4D table
4321
Eric Xing © Eric Xing @ CMU, 2006-2010 34
Example UGM using subcliquesExample UGM – using subcliques
AA
DD BB
CC
)(),,,( 4321
1
x
Z
xxxxP ijij 
)()()()()( 34342424232314141212
1
xxxxx 
Z
Z ij

 For discrete nodes we can represent P(X ) as 5 2D tables
 
4321 xxxx ij
ijijZ
,,,
)(x
Eric Xing © Eric Xing @ CMU, 2006-2010 35
 For discrete nodes, we can represent P(X1:4) as 5 2D tables
instead of one 4D table
Interpretation of Clique PotentialsInterpretation of Clique Potentials
YXX ZZ
 The model implies XZ|Y. This independence statement
i li (b d fi iti ) th t th j i t t f t i
YXX ZZ
implies (by definition) that the joint must factorize as:
)|()|()(),,( yzpyxpypzyxp 
)|()()( yzpyxpzyxp
 We can write this as: , but
 cannot have all potentials be marginals
),()|(),,(
)|(),(),,(
yzpyxpzyxp
yzpyxpzyxp


 cannot have all potentials be marginals
 cannot have all potentials be conditionals
 The positive clique potentials can only be thought of as
Eric Xing © Eric Xing @ CMU, 2006-2010 36
p q p y g
general "compatibility", "goodness" or "happiness" functions
over their variables, but not as probability distributions.
Exponential FormExponential Form
 Constraining clique potentials to be positive could be inconvenient (e.g.,
the interactions between a pair of atoms can be either attractive or
repulsive). We represent a clique potential c(xc) in an unconstrained
form using a real-value "energy" function c(xc):
For convenience, we will call c(xc) a potential when no confusion arises from the context.
 This gives the joint a nice additive strcuture
 )(exp)( cccc xx  
 This gives the joint a nice additive strcuture
 )(exp)(exp)( xxx H
ZZ
p
Cc
cc 






 
11

where the sum in the exponent is called the "free energy":
I h i thi i ll d th "B lt di t ib ti "


Cc
ccH )()( xx 
Eric Xing © Eric Xing @ CMU, 2006-2010 37
 In physics, this is called the "Boltzmann distribution".
 In statistics, this is called a log-linear model.
Example: Boltzmann machinesExample: Boltzmann machines
1
44 22
 A fully connected graph with pairwise (edge) potentials on
binary-valued nodes (for ) is called a
33
   1011 ,or,  ii xxy ( )
Boltzmann machine
   ,, ii






  xx
Z
xxxxP
ij
jiij )(exp),,,( ,
1
4321
 Hence the overall energy function has the form:






  Cxxx
Z i
ii
ij
jiij exp
1
Eric Xing © Eric Xing @ CMU, 2006-2010 38
 Hence the overall energy function has the form:
)()()()()(    xxxxxH T
ij jiji
Example: Ising (spin-glass)
modelsmodels
 Nodes are arranged in a regular topology (often a regularg g p gy ( g
packing grid) and connected only to their geometric
neighbors.
 Same as sparse Boltzmann machine, where ij0 iff i,j are
neighbors.
 e.g., nodes are pixels, potential function encourages nearby pixels to have similar
Eric Xing © Eric Xing @ CMU, 2006-2010 39
e g , odes a e p e s, po e a u c o e cou ages ea by p e s o a e s a
intensities.
 Potts model: multi-state Ising model.
Example: Modeling GoExample: Modeling Go
Eric Xing © Eric Xing @ CMU, 2006-2010 40
GMs are your old friendsGMs are your old friends
Density estimation
Parametric and nonparametric methods
m,s
X
Regression
Parametric and nonparametric methods
X Y
X
X
Classification
Linear, conditional mixture, nonparametric
Q Q
X Y
Classification
Generative and discriminative approach
X X
Eric Xing © Eric Xing @ CMU, 2006-2010 41
An
(incomplete)(incomplete)
genealogy
f hi lof graphical
models
(Picture by Zoubin
Ghahramani and
Eric Xing © Eric Xing @ CMU, 2006-2010 42
Sam Roweis)
Why graphical models
 Probability theory provides the glue whereby the parts are combined,
Why graphical models
ensuring that the system as a whole is consistent, and providing ways to
interface models to data.
 The graph theoretic side of graphical models provides both an intuitively
appealing interface by which humans can model highly-interacting sets of
variables as well as a data structure that lends itself naturally to the design of
efficient general-purpose algorithms.
 Many of the classical multivariate probabilistic systems studied in fields
such as statistics, systems engineering, information theory, pattern
recognition and statistical mechanics are special cases of the general
graphical model formalismgraphical model formalism
• -- examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models.
 The graphical model framework provides a way to view all of these systems
as instances of a common underlying formalism.
Eric Xing © Eric Xing @ CMU, 2006-2010 43
y g
--- M. Jordan

More Related Content

Viewers also liked

Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Aula Hiperdia 06.05.2009 I
Aula Hiperdia   06.05.2009 IAula Hiperdia   06.05.2009 I
Aula Hiperdia 06.05.2009 ISandra Flôr
 
Trabalho hipertensão arterial 1
Trabalho   hipertensão arterial 1Trabalho   hipertensão arterial 1
Trabalho hipertensão arterial 1R.C.T
 
Hipertensão arterial powerpoint
Hipertensão arterial   powerpoint Hipertensão arterial   powerpoint
Hipertensão arterial powerpoint AnaRitaPinheiro
 
Hipertensão o que é isso?
Hipertensão o que é isso?Hipertensão o que é isso?
Hipertensão o que é isso?Professor Robson
 

Viewers also liked (7)

Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Hipertensao Arterial
Hipertensao ArterialHipertensao Arterial
Hipertensao Arterial
 
Aula Hiperdia 06.05.2009 I
Aula Hiperdia   06.05.2009 IAula Hiperdia   06.05.2009 I
Aula Hiperdia 06.05.2009 I
 
Trabalho hipertensão arterial 1
Trabalho   hipertensão arterial 1Trabalho   hipertensão arterial 1
Trabalho hipertensão arterial 1
 
Hipertensão
HipertensãoHipertensão
Hipertensão
 
Hipertensão arterial powerpoint
Hipertensão arterial   powerpoint Hipertensão arterial   powerpoint
Hipertensão arterial powerpoint
 
Hipertensão o que é isso?
Hipertensão o que é isso?Hipertensão o que é isso?
Hipertensão o que é isso?
 

Similar to Lecture10 xing

Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...준식 최
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLuba Elliott
 
Bayesian network learning with the pc algorithm an improved and correct varia...
Bayesian network learning with the pc algorithm an improved and correct varia...Bayesian network learning with the pc algorithm an improved and correct varia...
Bayesian network learning with the pc algorithm an improved and correct varia...AlferoSimona
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"tuxette
 
Lecture13 xing fei-fei
Lecture13 xing fei-feiLecture13 xing fei-fei
Lecture13 xing fei-feiTianlu Wang
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy AnnotationsToru Tamaki
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
 
Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...tuxette
 

Similar to Lecture10 xing (20)

Lecture15 xing
Lecture15 xingLecture15 xing
Lecture15 xing
 
Project3.ppt
Project3.pptProject3.ppt
Project3.ppt
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
Prac ex'cises 3[1].5
Prac ex'cises 3[1].5Prac ex'cises 3[1].5
Prac ex'cises 3[1].5
 
Lecture11 xing
Lecture11 xingLecture11 xing
Lecture11 xing
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
 
Bayesian network learning with the pc algorithm an improved and correct varia...
Bayesian network learning with the pc algorithm an improved and correct varia...Bayesian network learning with the pc algorithm an improved and correct varia...
Bayesian network learning with the pc algorithm an improved and correct varia...
 
EIS_REVIEW_1.pptx
EIS_REVIEW_1.pptxEIS_REVIEW_1.pptx
EIS_REVIEW_1.pptx
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
Lecture13 xing fei-fei
Lecture13 xing fei-feiLecture13 xing fei-fei
Lecture13 xing fei-fei
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 
Gate-Cs 2007
Gate-Cs 2007Gate-Cs 2007
Gate-Cs 2007
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
 
Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...
 

More from Tianlu Wang

14 pro resolution
14 pro resolution14 pro resolution
14 pro resolutionTianlu Wang
 
13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculusTianlu Wang
 
12 adversal search
12 adversal search12 adversal search
12 adversal searchTianlu Wang
 
11 alternative search
11 alternative search11 alternative search
11 alternative searchTianlu Wang
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculusTianlu Wang
 
20 bayes learning
20 bayes learning20 bayes learning
20 bayes learningTianlu Wang
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidenceTianlu Wang
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledgeTianlu Wang
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systemsTianlu Wang
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based systemTianlu Wang
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolutionTianlu Wang
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolutionTianlu Wang
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic searchTianlu Wang
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed searchTianlu Wang
 

More from Tianlu Wang (20)

L7 er2
L7 er2L7 er2
L7 er2
 
L8 design1
L8 design1L8 design1
L8 design1
 
L9 design2
L9 design2L9 design2
L9 design2
 
14 pro resolution
14 pro resolution14 pro resolution
14 pro resolution
 
13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculus
 
12 adversal search
12 adversal search12 adversal search
12 adversal search
 
11 alternative search
11 alternative search11 alternative search
11 alternative search
 
10 2 sum
10 2 sum10 2 sum
10 2 sum
 
22 planning
22 planning22 planning
22 planning
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculus
 
20 bayes learning
20 bayes learning20 bayes learning
20 bayes learning
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidence
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledge
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systems
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based system
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolution
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolution
 
15 predicate
15 predicate15 predicate
15 predicate
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed search
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 

Lecture10 xing

  • 1. Machine LearningMachine Learninggg Graphical ModelsGraphical Models Eric XingEric Xing Lecture 10, August 14, 2010 Receptor A Receptor BX1 X2 Receptor A Receptor BX1 X2X1 X2 Eric Xing © Eric Xing @ CMU, 2006-2010 1 ectu e 0, ugust , 0 0 Reading: Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 X3 X4 X5 X6 X7 X8
  • 2. Multivariate Distribution in High- D SpaceD Space  A possible world for cellular signal transduction: Receptor A Receptor BX1 X2 p g Ki C Ki EKi DX X XKinase C TF F Kinase EKinase DX3 X4 X5 TF F X6 Eric Xing © Eric Xing @ CMU, 2006-2010 2 Gene G Gene HX7 X8
  • 3. Recap of Basic Prob Concepts  Representation: what is the joint probability dist. on multiple Recap of Basic Prob. Concepts p j p y p variables? H t t fi ti i t t l? 28 ),,,,,,,,( 87654321 XXXXXXXXP A BA BA BA B  How many state configurations in total? --- 28  Are they all needed to be represented?  Do we get any scientific/medical insight? C F G H EDC F G H EDC F G H EDC F G H ED  Learning: where do we get all this probabilities?  Maximal-likelihood estimation? but how many data do we need?  Where do we put domain knowledge in terms of plausible relationships between variables, and plausible values of the probabilities?plausible values of the probabilities?  Inference: If not all variables are observable, how to compute the conditional distribution of latent variables given evidence? Eric Xing © Eric Xing @ CMU, 2006-2010 3 g  Computing p(H|A) would require summing over all 26 configurations of the unobserved variables
  • 4. What is a Graphical Model? l f i l t d ti th--- example from a signal transduction pathway  A possible world for cellular signal transduction: Receptor A Receptor BX1 X2 p g Ki C Ki EKi DX X XKinase C TF F Kinase EKinase DX3 X4 X5 TF F X6 Eric Xing © Eric Xing @ CMU, 2006-2010 4 Gene G Gene HX7 X8
  • 5. GM: Structure Simplifies RepresentationRepresentation  Dependencies among variables Receptor A Receptor BX1 X2 p g Ki C Ki EKi D Membrane X X XKinase C TF F Kinase EKinase D Cytosol X3 X4 X5 TF F X6 Eric Xing © Eric Xing @ CMU, 2006-2010 5 Gene G Gene H Nucleus X7 X8
  • 6. Probabilistic Graphical Models, con'd  If Xi's are conditionally independent (as described by a PGM), the j i t b f t d t d t f i l t con d joint can be factored to a product of simpler terms, e.g., P(X1, X2, X3, X4, X5, X6, X7, X8) Receptor A Receptor BX1 X2 Receptor A Receptor BX1 X2X1 X2 ( 1 2 3 4 5 6 7 8) = P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2) P(X6| X3, X4) P(X7| X6) P(X8| X5, X6) Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 X3 X4 X5 X6 X7 X8  Why we may favor a PGM?  Representation cost: how many probability statements are needed? 888  Algorithms for systematic and efficient inference/learning computation • Exploring the graph structure and probabilistic (e.g., Bayesian, Markovian) semantics 2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28! Eric Xing © Eric Xing @ CMU, 2006-2010 6  Incorporation of domain knowledge and causal (logical) structures
  • 7. Specification of a BNSpecification of a BN  There are two components to any GM:p y  the qualitative specification  the quantitative specification A C ED BA C ED BA C ED B C F G H EDC F G H EDC F G H ED 0.9 0.1 c 0.2 0.8 dc d d DC P(F | C,D) 0.9 0.1 c 0.2 0.8 dc d d DC P(F | C,D) G HG HG H d c 0.01 0.99 0.9 0.1d c d c 0.01 0.99 0.9 0.1d c Eric Xing © Eric Xing @ CMU, 2006-2010 7
  • 8. Qualitative SpecificationQualitative Specification  Where does the qualitative specification come from?q p  Prior knowledge of causal relationships  Prior knowledge of modular relationships  Assessment from experts  Learning from data  We simply link a certain architecture (e.g. a layered graph)  … Eric Xing © Eric Xing @ CMU, 2006-2010 8
  • 9. Two types of GMs  Directed edges give causality relationships (Bayesian Two types of GMs g g y p ( y Network or Directed Graphical Model): Receptor A Receptor BX1 X2 Receptor A Receptor BX1 X2X1 X2 P(X1, X2, X3, X4, X5, X6, X7, X8) Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 Kinase C TF F Gene G Gene H Kinase EKinase DX3 X4 X5 X6 X7 X8 X3 X4 X5 X6 X7 X8 P(X1, X2, X3, X4, X5, X6, X7, X8) = P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2) P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)  Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model): Receptor A Kinase C TF F Kinase EKinase D Receptor BX1 X2 X3 X4 X5 Receptor A Kinase C TF F Kinase EKinase D Receptor BX1 X2 X3 X4 X5 X1 X2 X3 X4 X5 P(X1, X2, X3, X4, X5, X6, X7, X8) = 1/Z exp{E(X1)+E(X2)+E(X3, X1)+E(X4, X2)+E(X5, X2) Eric Xing © Eric Xing @ CMU, 2006-2010 9 TF F Gene G Gene H X6 X7 X8 TF F Gene G Gene H X6 X7 X8 X6 X7 X8 p{ ( 1) ( 2) ( 3 1) ( 4 2) ( 5 2) + E(X6, X3, X4)+E(X7, X6)+E(X8, X5, X6)}
  • 10. Bayesian Network:Bayesian Network:  A BN is a directed graph whose nodes represent the random variables and whose edges represent direct influence of one variable on another.  It is a data structure that provides the skeleton for representing a joint distribution compactly in a factorized way;  It offers a compact representation for a set of conditional independence assumptions about a distribution;  We can view the graph as encoding a generative sampling process executed by nature, where the value for each variable is selected by Eric Xing © Eric Xing @ CMU, 2006-2010 10 nature using a distribution that depends only on its parents. In other words, each variable is a stochastic function of its parents.
  • 11. Bayesian Network: Factorization TheoremBayesian Network: Factorization Theorem  Theorem: Given a DAG, The most general form of the probability distribution that is consistent with the graph factors according to “node given its parents”:to node given its parents : where is the set of parents of X d is the number of nodes   di i i XPP : )|()( 1 XX Xwhere is the set of parents of Xi, d is the number of nodes (variables) in the graph. iX P(X1, X2, X3, X4, X5, X6, X7, X8) = P(X ) P(X ) P(X | X ) P(X | X ) P(X | X ) Receptor A Kinase C TF F Kinase EKinase D Receptor BX1 X2 X3 X4 X5 Receptor A Kinase C TF F Kinase EKinase D Receptor BX1 X2 X3 X4 X5 X1 X2 X3 X4 X5 Eric Xing © Eric Xing @ CMU, 2006-2010 11 = P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2) P(X6| X3, X4) P(X7| X6) P(X8| X5, X6) TF F Gene G Gene H X6 X7 X8 TF F Gene G Gene H X6 X7 X8 X6 X7 X8
  • 12. Bayesian Network: Conditional Independence Semantics Structure: DAG Ancestor Independence Semantics • Meaning: a node is conditionally independent Ancestor Parent of every other node in the network outside its Markov blanket Y1 Y2 Parent • Local conditional distributions (CPD) and the DAG completely determine the X completely determine the joint dist. • Give causality relationships Children's co-parentChildren's co-parent Child Eric Xing © Eric Xing @ CMU, 2006-2010 12 Give causality relationships, and facilitate a generative process Descendent
  • 13. Local Structures & Independencies B Independencies  Common parent A C p  Fixing B decouples A and C "given the level of gene B, the levels of A and C are independent" A CB  Cascade  Knowing B decouples A and C "given the level of gene B, the level gene A provides no t di ti l f th l l f C" A B extra prediction value for the level of gene C"  V-structure  Knowing C couples A and B C  Knowing C couples A and B because A can "explain away" B w.r.t. C "If A correlates to C, then chance for B to also correlate to B will decrease" Eric Xing © Eric Xing @ CMU, 2006-2010 13  The language is compact, the concepts are rich!
  • 14. A simple justificationA simple justification B A C Eric Xing © Eric Xing @ CMU, 2006-2010 14
  • 15. Graph separation criterionGraph separation criterion  D-separation criterion for Bayesian networks (D for Directedp y ( edges): D fi iti i bl d D t d ( diti llDefinition: variables x and y are D-separated (conditionally independent) given z if they are separated in the moralized ancestral graph  Example: Eric Xing © Eric Xing @ CMU, 2006-2010 15
  • 16. Global Markov properties of DAGsDAGs  X is d-separated (directed-separated) from Z given Y if we can't send a ball from any node in X to any node in Z using the "Bayes- ball" algorithm illustrated bellow (and plus some boundary conditions): • Defn: I(G)all independence properties that correspond to d-p p p separation:   • D-separation is sound and complete  );(dsep:)(I YZXYZXG G Eric Xing © Eric Xing @ CMU, 2006-2010 16 p p
  • 17. Example:Example:  Complete the I(G) of this x4 graph: x1 x4 x1 x3 x2 x3 2 Eric Xing © Eric Xing @ CMU, 2006-2010 17
  • 18. Towards quantitative specification of probability distributionprobability distribution  Separation properties in the graph imply independencep p p g p p y p properties about the associated variables  For the graph to be useful, any conditional independence properties we can derive from the graph should hold for theproperties we can derive from the graph should hold for the probability distribution that the graph represents  The Equivalence Theorem For a graph G, L t D d t th f il f ll di t ib ti th t ti f I(G)Let D1 denote the family of all distributions that satisfy I(G), Let D2 denote the family of all distributions that factor according to G, Then D1≡D2. Eric Xing © Eric Xing @ CMU, 2006-2010 18 1 2
  • 19. ExampleExample  Speech recognitionp g A AA AX2 X3X1 XT Y2 Y3Y1 YT... A AA AX2 X3X1 XT... Hidd M k M d lHidden Markov Model Eric Xing © Eric Xing @ CMU, 2006-2010 19
  • 20. Knowledge EngineeringKnowledge Engineering  Picking variablesg  Observed  Hidden  Picking structure  CAUSAL  Generative Generative  Picking Probabilities Z b biliti Zero probabilities  Orders of magnitudes  Relative values Eric Xing © Eric Xing @ CMU, 2006-2010 20
  • 21. Example con'dExample, con d  Evolution ancestor Qh Qm T years ? A C Qh m A G A G A C A C Tree Model Eric Xing © Eric Xing @ CMU, 2006-2010 21
  • 22. Example con'dExample, con d  Genetic Pedigree A0 A1 Ag B0 B1 Bg g F0 M 0 M 1 F1 Fg Sg C 0C Eric Xing © Eric Xing @ CMU, 2006-2010 22 C 1 g
  • 23. Conditional probability tables (CPTs) a0 0 75 b0 0 33 P( b d) (CPTs) a0 0.75 a1 0.25 b0 0.33 b1 0.67 P(a,b,c.d) = P(a)P(b)P(c|a,b)P(d|c) a0b0 a0b1 a1b0 a1b1 A B a b a b a b a b c0 0.45 1 0.9 0.7 c1 0.55 0 0.1 0.3 C D c0 c1 d0 0 3 0 5 Eric Xing © Eric Xing @ CMU, 2006-2010 23 D d 0.3 0.5 d1 07 0.5
  • 24. Conditional probability density func (CPDs) P( b d) func. (CPDs) P(a,b,c.d) = P(a)P(b)P(c|a,b)P(d|c)A~N(μa, Σa) B~N(μb, Σb) A B C C~N(A+B, Σc) C) D D~N(μ +C Σ ) C P(D| Eric Xing © Eric Xing @ CMU, 2006-2010 24 D D~N(μa+C, Σa) D C
  • 25. Conditionally Independent ObservationsObservations  Model parameters y1 Datay2 yn-1 yn Eric Xing © Eric Xing @ CMU, 2006-2010 25
  • 26. “Plate” NotationPlate Notation  Model parameters yi Data = {y1,…yn} i=1:n Plate = rectangle in graphical model variables within a plate are replicated Eric Xing © Eric Xing @ CMU, 2006-2010 26 in a conditionally independent manner
  • 27. Example: Gaussian ModelExample: Gaussian Model  G ti d l Generative model: p(y1,…yn | , ) = P p(yi | , )  yi i=1:n = p(data | parameters) = p(D | ) h  { }i=1:n where  = {, }  Likelihood = p(data | parameters) = p( D |  )= p( D |  ) = L ()  Likelihood tells us how likely the observed data are conditioned on a Eric Xing © Eric Xing @ CMU, 2006-2010 27 particular setting of the parameters  Often easier to work with log L ()
  • 28. Example: Bayesian Gaussian ModelModel    yyi i=1:n Note: priors and parameters are assumed independent here Eric Xing © Eric Xing @ CMU, 2006-2010 28
  • 29. Markov Random Fields Structure: an undirected Markov Random Fields graph • Meaning: a node isg conditionally independent of every other node in the network given its Directed neighbors Y1 Y2 • Local contingency functions (potentials) and the cliques in the graph completely determine X the joint dist. • Give correlations between variables but no explicit way to Eric Xing © Eric Xing @ CMU, 2006-2010 29 variables, but no explicit way to generate samples
  • 30. Global Markov propertyGlobal Markov property  Let H be an undirected graph:g p  B separates A and C if every path from a node in A to a node B separates A and C if every path from a node in A to a node in C passes through a node in B:  A probability distribution satisfies the global Markov property f f C C );(sep BCAH Eric Xing © Eric Xing @ CMU, 2006-2010 30 if for any disjoint A, B, C, such that B separates A and C, A is independent of C given B:  );(sep:))(I BCABCAH H
  • 31. Soundness and completeness of global Markov propertyglobal Markov property  Defn: An UG H is an I-map for a distribution P if I(H)  I(P),p ( ) ( ) i.e., P entails I(H).  Defn: P is a Gibbs distribution over H if it can be represented asas   Cc ccn Z xxP )(),,( x 1 1   Thm (soundness): If P is a Gibbs distribution over H, then H is an I-map of P.  Thm (completeness): If sepH(X; Z |Y), then X P Z |Y in some P that factorizes over H. Eric Xing © Eric Xing @ CMU, 2006-2010 31 so e t at acto es o e
  • 32. RepresentationRepresentation  Defn: an undirected graphical model represents a distributiong p p P(X1 ,…,Xn) defined by an undirected graph H, and a set of positive potential functions yc associated with cliques of H, s.t. 1 where Z is known as the partition function:   Cc ccn Z xxP )(),,( x 1 1     nxx Cc ccZ ,, )( 1 x  Also known as Markov Random Fields, Markov networks …  The potential function can be understood as an contingency function of its arguments assigning "pre-probabilistic" score of Eric Xing © Eric Xing @ CMU, 2006-2010 32 function of its arguments assigning pre probabilistic score of their joint configuration.
  • 33. CliquesCliques  For G={V,E}, a complete subgraph (clique) is a subgraph{ } p g p ( q ) g p G'={V'ÍV,E'ÍE} such that nodes in V' are fully interconnected  A (maximal) clique is a complete subgraph s.t. any superset V"ÉV' is not completeV ÉV is not complete.  A sub-clique is a not-necessarily-maximal clique. A DD BB  Example: max cliques = {A B D} {B C D} CC Eric Xing © Eric Xing @ CMU, 2006-2010 33  max-cliques = {A,B,D}, {B,C,D},  sub-cliques = {A,B}, {C,D}, … all edges and singletons
  • 34. Example UGM – using max cliquescliques AA DD BB CC )()(),,,( 2341244321 1 xxxxxxP   )()(),,,( 2341244321 xx cc Z xxxxP     4321 234124 xxxx ccZ ,,, )()( xx   For discrete nodes, we can represent P(X1:4) as two 3D tables instead of one 4D table 4321 Eric Xing © Eric Xing @ CMU, 2006-2010 34
  • 35. Example UGM using subcliquesExample UGM – using subcliques AA DD BB CC )(),,,( 4321 1 x Z xxxxP ijij  )()()()()( 34342424232314141212 1 xxxxx  Z Z ij   For discrete nodes we can represent P(X ) as 5 2D tables   4321 xxxx ij ijijZ ,,, )(x Eric Xing © Eric Xing @ CMU, 2006-2010 35  For discrete nodes, we can represent P(X1:4) as 5 2D tables instead of one 4D table
  • 36. Interpretation of Clique PotentialsInterpretation of Clique Potentials YXX ZZ  The model implies XZ|Y. This independence statement i li (b d fi iti ) th t th j i t t f t i YXX ZZ implies (by definition) that the joint must factorize as: )|()|()(),,( yzpyxpypzyxp  )|()()( yzpyxpzyxp  We can write this as: , but  cannot have all potentials be marginals ),()|(),,( )|(),(),,( yzpyxpzyxp yzpyxpzyxp    cannot have all potentials be marginals  cannot have all potentials be conditionals  The positive clique potentials can only be thought of as Eric Xing © Eric Xing @ CMU, 2006-2010 36 p q p y g general "compatibility", "goodness" or "happiness" functions over their variables, but not as probability distributions.
  • 37. Exponential FormExponential Form  Constraining clique potentials to be positive could be inconvenient (e.g., the interactions between a pair of atoms can be either attractive or repulsive). We represent a clique potential c(xc) in an unconstrained form using a real-value "energy" function c(xc): For convenience, we will call c(xc) a potential when no confusion arises from the context.  This gives the joint a nice additive strcuture  )(exp)( cccc xx    This gives the joint a nice additive strcuture  )(exp)(exp)( xxx H ZZ p Cc cc          11  where the sum in the exponent is called the "free energy": I h i thi i ll d th "B lt di t ib ti "   Cc ccH )()( xx  Eric Xing © Eric Xing @ CMU, 2006-2010 37  In physics, this is called the "Boltzmann distribution".  In statistics, this is called a log-linear model.
  • 38. Example: Boltzmann machinesExample: Boltzmann machines 1 44 22  A fully connected graph with pairwise (edge) potentials on binary-valued nodes (for ) is called a 33    1011 ,or,  ii xxy ( ) Boltzmann machine    ,, ii         xx Z xxxxP ij jiij )(exp),,,( , 1 4321  Hence the overall energy function has the form:         Cxxx Z i ii ij jiij exp 1 Eric Xing © Eric Xing @ CMU, 2006-2010 38  Hence the overall energy function has the form: )()()()()(    xxxxxH T ij jiji
  • 39. Example: Ising (spin-glass) modelsmodels  Nodes are arranged in a regular topology (often a regularg g p gy ( g packing grid) and connected only to their geometric neighbors.  Same as sparse Boltzmann machine, where ij0 iff i,j are neighbors.  e.g., nodes are pixels, potential function encourages nearby pixels to have similar Eric Xing © Eric Xing @ CMU, 2006-2010 39 e g , odes a e p e s, po e a u c o e cou ages ea by p e s o a e s a intensities.  Potts model: multi-state Ising model.
  • 40. Example: Modeling GoExample: Modeling Go Eric Xing © Eric Xing @ CMU, 2006-2010 40
  • 41. GMs are your old friendsGMs are your old friends Density estimation Parametric and nonparametric methods m,s X Regression Parametric and nonparametric methods X Y X X Classification Linear, conditional mixture, nonparametric Q Q X Y Classification Generative and discriminative approach X X Eric Xing © Eric Xing @ CMU, 2006-2010 41
  • 42. An (incomplete)(incomplete) genealogy f hi lof graphical models (Picture by Zoubin Ghahramani and Eric Xing © Eric Xing @ CMU, 2006-2010 42 Sam Roweis)
  • 43. Why graphical models  Probability theory provides the glue whereby the parts are combined, Why graphical models ensuring that the system as a whole is consistent, and providing ways to interface models to data.  The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.  Many of the classical multivariate probabilistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalismgraphical model formalism • -- examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models.  The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. Eric Xing © Eric Xing @ CMU, 2006-2010 43 y g --- M. Jordan