Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.
1. http://www.cs.umd.edu/linqs
Computing Marginal Distributions over Continuous
Markov Networks for Statistical Relational Learning
Matthias Bröcheler and Lise Getoor Supported by NSF Grant No. 0937094
The complexity of computing an approximate
Lovasz & Vempala ‘04
Problem?
distribution σ* using hit-and-run sampling such that
Computing marginal distributions in constrained the total variation distance of σ* and P is less than ε is
continuous MRFs (CCMRF) ∗
3
d O n (kB + n + m)
˜ ˜
Motivation? where ñ=n-kA, under the assumptions that we start from an initial distribution σ such
Many applications of CCMRF, probabilistic soft logic Xi p that the density function dσ/dP is bounded by M except on a set S with σ(S)≤ε/s
being one of them
Contributions? Hit-and-Run Sampling In Theory…
q In Prac@ce…
Analysis of the theoretical and practical aspects of 1. Sample random direction
computing marginals in CCMRFs 2. Compute line segment d
3. Induce density on line Algorithm ε1
4. Sample from induced density p
1. Start=MAP state
What’s a CCMRF?
2. Dimensionality
reduction and LA
Constrained Continuous Markov Random Field Let’s approximate! 3. How do we get out ε2
of corners?
X = {X1 , .., Xn } : Di ⊂ R D = ×n Di zk − W k d i T
i=1 Computing the marginal probability density function
1. Corner heuristic di+1 = di + 2 Wk
φ = {φ1 , .., φm } : φj : D → [0, M] 4. Induce f efficiently
Constraints fX (x ) =
f (x , y)dy for a subset X ⊂ X under Wk 2
Λ = {λ1 , .., λm } ˜
y∈×D ,s.t.X ∈X
i i /
Equality Constraints
the probability measure defined by a CCMRF is #P
Probability measure P over X defined through A : D → RkA , a ∈ Rk A
1 m hard in the worst case. Experimental Results
Inequality Constraints
f (x) = exp[− λj φj (x)]
Z(Λ) B : D → Rk B , b ∈ Rk B Collective classification of 1717 Wikipedia articles with 20% seed documents
j=1
˜
D = D ∩ {x|A(x) = a ∧ B(x) ≤ b} In Theory… Setup using tf/idf weighted cosine similarity as baseline and comparing against a
m PSL program with learned weights over K-folds cross validation.
Z(Λ) = exp − λj φj (x) dx / ˜
f (x) = 0 ∀x ∈ D Why CCMRF? Std. Deviation Indicator of
D j=1 Folds Improvement P(Null Relative Confidence
over baseline Hypothesis) Difference Δ(σ)
Probabilistic soft logic (PSL) is a declarative language ∆(σ) = 2
σ− − σ+
20 41.4% 1.95E-09 38.3%
for collective probabilistic reasoning about similarity σ+ + σ−
What does it look like? or uncertainty in relational domains. PSL focuses on
25 31.7% 2.40E-13 41.2%
30 39.1% 1.00E-16 43.5% Hypothesis
X1
statistical relational learning problems with continuous 35 46.1% 4.54E-08 39.0% ∆(σ) 0
1 1 X1
φ3 (x) = max(0, x2 − x3 ) f
RVs and supports sets and aggregation.
Convergence Analysis
φ2 (x) = max(0, x1 − x2 ) 0 1 PSL programs get grounded into CCMRFs for inference. 5
KL Divergence
φ1 (x) = x1
x1 + x3 ≤ 1 w1 : class(B,C) A.text≈B.text class(A,C) Average KL Divergence
P(0.4 ≤ X2 ≤ 0.6) 0.5
X3
0
Highest
Probability 0
X3
w2 : class(B,C) link(A,B) class(A,C) Lowest Quartile KL RV)
Divergence
(322-413
1 1 Highest Quartile KL RV)
(174-224
X2
Λ = {1, 2, 1} Constraint: functional(class) 0.05
Divergence
X = {X1 , X2 , X3 } 30000 300000 Number of Samples 3000000