Algorithm for Bayesian Network Construction from Data
1. An Algorithm for Bayesian Network
Construction from Data
by: Jie Cheng
David A. Bell
Weiru Liu
University of Ulster, UK
Presented by: Jian Xu
2. Outline
• Introduction
• Some basic concepts
• The proposed algorithm for BN
construction
• Experiment results
• Discussions & comments
4/26/2010 Machine Learning 2
3. What is a Bayesian Network?
P(M)
Metastatic Cancer
.20
M P(S)
+ .80
- .20 M P(B)
+ .20
Serum Calcium Brain Tumor - .05
B P(H)
S B P(C)
+ .80
+ + .80
- .60
+ - .80
- + .80 Coma
Headaches
- - .05
Cancer BN Example
4/26/2010 Machine Learning 3
4. Bayesian Network (BN)
• A Bayesian network is a compact
graphical representation of a probability
distribution over a set of domain
random variables X = {X1, X2, …, Xn}
• Two components
– Structure: direct acyclic graph (DAG) over
nodes, which exploits causal relations in
the domain
– CPD: each node has a conditional
probability distribution associated with it
4/26/2010 Machine Learning 4
5. BN Learning
• Structure learning
– To identify the topology of the network
– Score based methods
– Dependency analysis methods
• Parameter learning
– To learn the conditional probabilities for a
given network topology
– MLE, Bayesian approach, etc
4/26/2010 Machine Learning 5
6. BN Structure Learning
• Search & scoring methods:
– To search for a structure most likely to have
generated the data
– Use heuristic search method to construct a model
and evaluate it using a scoring method, such as
MDL, Bayesian approach, etc
– May not find the best solution
– Random restarts: to avoid getting stuck in a local
maximum
– Less time complexity in the worst case, i.e., when
the underlying DAG is fully connected
4/26/2010 Machine Learning 6
7. BN Learning Algorithms (Cont’d)
• Dependency analysis methods:
– Use conditional independency (CI) test to analyze
dependency relationships among nodes.
– Usually asymptotically correct when the data is
DAG-faithful
– Works efficiently when the underlying network is
sparse
– CI tests with large condition sets may be
unreliable unless the volume of data is enormous.
– Used in this proposed algorithm
4/26/2010 Machine Learning 7
8. Basic Concepts
• D-separation: two nodes X and Y are called d-
separated given C if and only if there exists no
adjacency path P between X and Y, such that:
– every collider on P is in C or has a descendant in C
– no other nodes on path P is in C
– C is called a condition-set
• Open path: a path between X and Y is said to
be open if every node in the path is active.
• Closed path: if any node in the path is inactive
• Collider node
• Non-collider node
4/26/2010 Machine Learning 8
9. Basic Concepts (Cont’d)
• DAG-faithful: when there exists such a DAG that can
represent all the conditional independence relations of
the underlying distribution.
• D-map: a graph G is a dependency map (D-map) of M
if every independence relationship in M is true in G. (a
BN with no edge)
• I-map: a graph G is an independency map (I-map) of
M if every independence relationship in G is true in M.
(fully-connected BN)
• Minimum I-map: a graph G is an I-map of M, but the
removal of any arc from G yields a graph that is not an
I-map of M.
• P-map: a graph G is a perfect map of M if it is both a
D-map and an I-map of M.
4/26/2010 Machine Learning 9
10. Mutual Information
• The mutual information of two nodes Xi ,
Xj is defined as:
• The conditional mutual information is
defined as:
4/26/2010 Machine Learning 10
11. Assumptions
• All attributes are discrete
• No missing values in any record
• All the records are drawn from a single
probability model independently
• The size of dataset is big enough for
reliable CI tests
• The ordering of the attributes are
available before the network
construction
4/26/2010 Machine Learning 11
12. An Algorithm for BN Construction
• Drafting
– Compute mutual information of each pair
of nodes, and creates a draft of the model
• Thickening
– Adds arcs when the pairs of nodes cannot
be d-separated, get an I-map of the model
• Thinning
– Each arc of the I-map is examined using CI
tests and will be removed if the two nodes
are the arc are conditionally independent
4/26/2010 Machine Learning 12
13. Drafting Phase
1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two
empty lists S, R
2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the
I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of
nodes into an ordered set S.
3. Get the first two pairs of nodes in S, and remove them from S. Add
the Corresponding arc to E. (the direction of the arcs is
determined by the available node ordering)
4. Get the first pair of nodes remained in S and remove it from S. If
there is no open path between the two nodes (they are d-
separated given empty set), add the corresponding arc to E.
Otherwise add the pair of nodes to the end of an ordered set R.
5. Repeat step 4 until S is empty.
4/26/2010 Machine Learning 13
14. Drafting Example
• Figure (a) is the
underlying BN structure
• I(B,D) ≥ I(C,E) ≥ I(B,E)
≥ I(A,B) ≥ I(B,C) ≥
I(C,D) ≥ I(D,E) ≥ I(A,D)
≥ I(A,E) ≥ I(A,C) ≥ ε
• Figure (b) is the draft
graph
4/26/2010 Machine Learning 14
15. Thickening Phase
6. Get the first pair of nodes in R and remove it
from R
7. Find a block set that blocks each open path
between these nodes by a set of minimum
number of nodes. Conduct a CI test, if these two
nodes are still dependent on each other given
the block set, connect them by an arc.
8. Go to step 6 until R is empty.
4/26/2010 Machine Learning 15
16. Thickening Example
• Figure (b) is the draft
graph
• Examine (D,E) pair, find
the minimum set that
blocks all the open paths
between D and E {B}
• CI test reveal that D and E
are dependent given {B},
so arc (D,E) is added
• (A,C) is not added because
A and C are independent
given {B}
4/26/2010 Machine Learning 16
17. Thinning Phase
9. For each arc in E, if there are open paths
between the two nodes besides this arc,
remove this arc from E temporarily, and
call procedure find_block_set(current
graph, node1, node2). Conduct a CI test
on the condition of the block set. If the
two nodes are dependent, add this arc
back to E; otherwise remove the arc
permanently.
4/26/2010 Machine Learning 17
18. Thinning Example
• Figure (c) is the I-map
of the underlying BN
• Arc (B,E) is removed
because B and E are
independent of each
other given {C,D}.
• Figure (d) is the
perfect I-map of the
underlying dependency
model (a).
4/26/2010 Machine Learning 18
20. Complexity Analysis
• For a dataset with N attributes, r
maximum possible values each, k
parents at most
– Phase I: N2 mutual information
computation, each of which requires O(r2)
basic operations, O(N2r2)
– Phase II: at most N2 CI tests, each with at
most O(rk+2) basic operations, O(N2rk+2),
worst case O(N2rN)
– Phase III: same as Phase II.
4/26/2010 Machine Learning 20
22. Experiment setup
• ALARM BN (A Logical Alarm Reduction
Mechanism): a medical diagnosis system for
patient monitoring
– 37 nodes, 46 arcs
– 3 versions: same structure, different CPD’s
• 10000 cases for each dataset
• Modified conditional mutual information
calculation by taking the variable’s degree of
freedom into consideration to make CI tests
more reliable
• ε = 0.003
4/26/2010 Machine Learning 22
24. Discussions & Comments
• About the assumptions
– All attributes are discrete
– No missing values in any record
– The size of dataset is big enough for
reliable CI tests
– The ordering of the attributes are available
before the network construction
4/26/2010 Machine Learning 24
25. Discussions & Comments
• Threshold ε
– ε = 0.003
– How do we pick an appropriate ε?
– How does it affect the accuracy and time by
choosing different ε?
• Modification in the experiment part
– Use Modified conditional mutual information
calculation by taking the variable’s degree of
freedom into consideration to make CI tests more
reliable
– Does this modification affect the result in any way
other than increasing the accuracy?
4/26/2010 Machine Learning 25