A modified version of the manuscript Published on Feb 3, 2017.
1. Use a gamma prior for $r_k$.
2. Use the same shape parameter $s$ for all gamma distributions.
1. Topic modeling with Poisson factorization (2)
Tomonari Masada @ Nagasaki University
March 7, 2017
1 ELBO
To obtain update equations, we introduce auxiliary latent variables Z [1, 2, 3, 4]. zdkv is the
number of the tokens of the vth word in the dth document assigned to the kth topic. zdkv is
sampled from the Poisson distribution Poisson(θdkβkv).
The constraint k zdkv = ndv can be expressed with the probability mass function I(ndv= k zdkv).
The full joint distribution is given as below.
p(N, Z, Θ, β, φ; α, s, r) = p(β; α)p(φ; s, r)p(Θ; s, φ)p(N|Z)p(Z|Θ, β)
=
k
p(βk; α) ×
k
p(φk; s, r) ×
k
p(θk; s, φk) ×
d
p(nd|zd)p(zd|θd, β)
=
k
Γ(V α)
Γ(α)V
v
βα−1
kv ×
k
rs
Γ(s)
φs−1
k e−rφk
×
k d
φs
k
Γ(s)
θs−1
dk e−φkθdk
×
d v
I(ndv= k zdkv)
k
(θdkβkv)zdkv
e−θdkβkv
zdkv!
(1)
The generative model is fully described in Eq. (1).
We adopt the variational Bayesian inference for the posterior inference. The evidence lower
bound (ELBO) for the model is obtained as below.
log p(N) = log
Z
p(N, Z, Θ, β, φ)dΘdβdφ
≥
Z
q(Z)q(Θ)q(β)q(φ) log p(N, Z, Θ, β, φ)dΘdβdφ
−
Z
q(Z)q(Θ)q(β)q(φ) log q(Z)q(Θ)q(β)q(φ)dΘdβdφ
=
Z
q(Z)q(Θ)q(β) log p(Z|Θ, β)dΘdβ +
Z
q(Z) log p(N|Z)
+ q(Θ)q(φ) log p(Θ|φ)dΘdφ + q(β) log p(β)dβ + q(φ) log p(φ)dφ
−
z
q(Z) log q(Z) − q(Θ) log q(Θ)dΘ − q(β) log q(β)dβ − q(φ) log q(φ)dφ , (2)
where the approximate posterior q(Z, Θ, β, φ) is factorized.
We assume the followings for the factorized approximate posterior.
• q(zdv) is the multinomial distribution Mult(ndv, ωdv). ωdvk is the probability that a token
of the vth word in the dth document is assigned to the kth topic among the K topics. Note
that k zdkv = ndv holds.
• q(θdk) is the gamma distribution Gamma(adk, bdk).
• q(βk) is the asymmetric Dirichlet distribution Dirichlet(ξk).
• q(φk) is the gamma distribution Gamma(µk, νk).
1
2. 2 Auxiliary latent variables
The update equation for ωdvk can be obtained as below. The second term of the ELBO in Eq. (2)
can be rewritten as follows:
Z
q(Z) log p(N|Z) =
d v zdv
q(zdv) log I(ndv= k zdkv) = 0 , (3)
because k zdkv = ndv. Even when q(zdv) is not assumed to be a multinomial, there are no
problem with respect to this term as long as any sample from q(zdv) satisfies k zdkv = ndv.
The fifth term of the ELBO in Eq. (2) can be rewritten as follows:
Z
q(Z) log q(Z) =
d v zdv
q(zdv) log
ndv!
k zdkv!
k
ωzdkv
dkv
=
d v
log(ndv!) −
d v zdv
q(zdv)
k
log(zdkv!) +
d v zdv
q(zdv)
k
zdkv log ωdkv
=
d v
log(ndv!) −
d v zdv
q(zdv)
k
log(zdkv!) +
d v k
ndvωdkv log ωdkv (4)
The first term of the ELBO in Eq. (2) can be rewritten as follows:
Z
q(Z)q(Θ)q(β) log p(Z|Θ, β)dΘdβ
=
Z
q(Z)q(Θ)q(β)
d v k
log (θdkβkv)zdkv
e−θdkβkv
dΘdβ
−
Z
q(Z)
d v k
log(zdkv!)
=
d v k zdv
q(zdv)zdkv q(θdk) log θdkdθdk +
d v k zdv
q(zdv)zdkv q(βk) log βkvdβk
−
d v k
q(βk) q(θdk)θdkdθdk βkvdβk −
d v zdv
q(zdv)
k
log(zdkv!)
=
d v k
ndvωdkv ψ(adk) − log(bdk) +
d v k
ndvωdkv ψ(ξkv) − ψ(
v
ξkv)
−
d v k
adk
bdk
ξkv
v ξkv
−
d v zdv
q(zdv)
k
log(zdkv!) (5)
Therefore, the terms relevant to ω in the ELBO are summed up as follows:
L(ω) =
d v k
ndvωdkv ψ(adk) − log(bdk) +
d v k
ndvωdkv ψ(ξkv) − ψ(
v
ξkv)
−
d v zdv
q(zdv)
k
log(zdkv!)
+
d v zdv
q(zdv)
k
log(zdkv!) −
d v k
ndvωdkv log ωdkv
=
d v k
ndvωdkv ψ(adk) − log(bdk) +
d v k
ndvωdkv ψ(ξkv) − ψ(
v
ξkv)
−
d v k
ndvωdkv log ωdkv (6)
By introducing Lagrange multipliers, we can obtain the update equation ωdkv ∝
exp ψ(adk)
bdk
exp ψ(ξkv)
exp ψ v ξkv
.
2
3. 3 Gamma posterior 1
The third term of the ELBO in Eq. (2) can be rewritten as follows:
q(φk)q(θdk) log p(θdk; s, φk)dθdkdφk = q(φk)q(θdk) log
φs
k
Γ(s)
θs−1
dk e−φkθdk
dθdkdφk
= s ψ(µk) − log νk − log Γ(s) + (s − 1) ψ(adk) − log bdk −
adk
bdk
µk
νk
(7)
The seventh term of the ELBO in Eq. (2) can be rewritten as follows:
q(θdk) log q(θdk)dθdk = q(θdk) log
badk
dk
Γ(adk)
θadk−1
dk e−bdkθdk
dθdk
= −adk + log bdk − log Γ(adk) + (adk − 1)ψ(adk) (8)
L(adk, bdk) =
v
ndvωdkv ψ(adk) − log bdk −
v
adk
bdk
ξkv
v ξkv
+ (s − 1) ψ(adk) − log bdk −
adk
bdk
µk
νk
+ adk − log bdk + log Γ(adk) − (adk − 1)ψ(adk)
=
v
ndvωdkv − adk + s ψ(adk) + log Γ(adk) + adk
−
v
ndvωdkv + s log bdk −
adk
bdk
µk
νk
+ 1 (9)
∂L(adk, bdk)
∂adk
=
v
ndvωdkv − adk + s ψ (adk) + 1 −
1
bdk
µk
νk
+ 1 (10)
∂L(adk, bdk)
∂bdk
= −
v
ndvωdkv + s
1
bdk
+
adk
b2
dk
µk
νk
+ 1 (11)
Both ∂L(adk,bdk)
∂adk
= 0 and ∂L(adk,bdk)
∂bdk
= 0 are satisfied when adk = v ndvωdkv +s and bdk = µk
νk
+1.
4 Gamma posterior 2
The fifth term of the ELBO in Eq. (2) can be rewritten as follows:
q(φk) log p(φk; s, r)dφk = q(φk) log
rs
Γ(s)
φs−1
k e−rφk
dφk
= s log r − log Γ(s) + (s − 1) ψ(µk) − log νk − r
µk
νk
(12)
The ninth term of the ELBO in Eq. (2) can be rewritten as follows:
q(φk) log q(φk)dφk = q(φk) log
νµk
k
Γ(µk)
φµk−1
k e−νkφk
dφk
= −µk + log νk − log Γ(µk) + (µk − 1)ψ(µk) (13)
L(µdk, νdk) = Ds ψ(µk) − log νk −
µk
νk
d
adk
bdk
+ (s − 1) ψ(µk) − log νk − r
µk
νk
+ µk − log νk + log Γ(µk) − (µk − 1)ψ(µk) (14)
∂L(µk, νk)
∂µk
= (Ds + s − µk)ψ (µk) −
1
νk
d
adk
bdk
+ r + 1 (15)
∂L(µk, νk)
∂νk
= −
Ds + s
νk
+
µk
ν2
k d
adk
bdk
+ r (16)
Both ∂L(µk,νk)
∂µk
= 0 and ∂L(µk,νk)
∂νk
= 0 are satisfied when µk = Ds + s and νk = d
adk
bdk
+ r.
3
4. 5 Dirichlet posterior
The fourth term of the ELBO in Eq. (2) can be rewritten as follows:
q(βk) log p(βk)dβk = q(βk) log
Γ(V α)
Γ(α)V
v
βα−1
kv dβk
= log Γ(V α) − V log Γ(α) + (α − 1)
v
ψ(ξkv) − ψ(
v
ξkv) (17)
The eighth term of the ELBO in Eq. (2) can be rewritten as follows:
q(βk) log q(βk)dβk = q(βk) log
Γ( v ξkv)
v Γ(ξkv) v
βξkv−1
kv dβk
= log Γ(
v
ξkv) −
v
log Γ(ξkv) +
v
(ξkv − 1) ψ(ξkv) − ψ(
v
ξkv)
(18)
L(ξk) =
v d
ndvωdkv ψ(ξkv) − ψ(
v
ξkv)
+ (α − 1)
v
ψ(ξkv) − ψ(
v
ξkv)
− log Γ(
v
ξkv) +
v
log Γ(ξkv) −
v
(ξkv − 1) ψ(ξkv) − ψ(
v
ξkv) (19)
∂L(ξk)
∂ξkv
=
v d
ndvωdkv + α − ξkv
∂
∂ξkv
ψ(ξkv) − ψ(
v
ξkv) (20)
Therefore, we obtain the update equation ξkv = d ndvωdkv + α.
6 Summary
ωdkv ∝
exp ψ(adk)
bdk
exp ψ(ξkv)
exp ψ v ξkv
(21)
adk =
v
ndvωdkv + s (22)
bdk =
µk
νk
+ 1 (23)
ξkv =
d
ndvωdkv + α (24)
µk = Ds + s (25)
νk =
d
adk
bdk
+ r (26)
References
[1] Allison June-Barlow Chaney, Hanna M. Wallach, Matthew Connelly, and David M. Blei. De-
tecting and characterizing events. EMNLP, pp. 1142–1152, 2016.
[2] David B. Dunson and Amy H. Herring. Bayesian latent variable models for mixed discrete
outcomes. Biostatistics, Vol. 6, No. 1, pp. 11–25, 2005.
[3] Prem Gopalan, Laurent Charlin, and David M. Blei. Content-based recommendations with
Poisson factorization. NIPS, pp. 3176–3184, 2014.
[4] Prem Gopalan, Jake M. Hofman, and David M. Blei. Scalable recommendation with hierarchical
Poisson factorization. UAI, pp. 326–335, 2015.
4