SlideShare uma empresa Scribd logo
1 de 82
Baixar para ler offline
Mining at scale with latent factor models
for matrix completion
Fabio Petroni
Recommender systems
Marco wants
to watch a
movie.
A B C D E F G H I J K ... Z
A (1998)
A-ge-man (1990)
A.K.G. (2007)
A Nous Amours (1983)
... many pages later ...
Azumi (2003)
DVD rental store
2 of 58
Recommender systems
Marco wants
to watch a
movie.
A B C D E F G H I J K ... Z
A (1998)
A-ge-man (1990)
A.K.G. (2007)
A Nous Amours (1983)
... many pages later ...
Azumi (2003)
DVD rental store
But there are so many movies!
Which ones will he like?
2 of 58
Collaborative filtering
I problem
B set of users
B set of items (movies, books, songs, ...)
B feedback
I explicit (ratings, ...)
I implicit (purchase, click-through, ...)
I predict the preference of each user for each item
B assumption: similar feedback $ similar taste
I example (explicit feedback):
Avatar The Matrix Up
Marco 4 2
Luca 3 2
Anna 5 3
3 of 58
Collaborative filtering
I problem
B set of users
B set of items (movies, books, songs, ...)
B feedback
I explicit (ratings, ...)
I implicit (purchase, click-through, ...)
I predict the preference of each user for each item
B assumption: similar feedback $ similar taste
I example (explicit feedback):
Avatar The Matrix Up
Marco ? 4 2
Luca 3 2 ?
Anna 5 ? 3
3 of 58
Collaborative filtering taxonomy
SVD PMFuser based PLS(A/I)
memory
based
collaborative
filtering
item based
model
based
probabilistic
methods
neighborhood
models
dimensionality
reduction
matrix
completion
latent
Dirichlet
allocation
other machine
learning
methods
Bayesian
networks
Markov
decision
processes
neural
networks
I matrix completion is currently considered the best approach
I advantages with respect to both scalability and accuracy
4 of 58
Matrix completion for collaborative filtering
I the completion is driven by a factorization
R P Q
I associate a latent factor vector with each user and each item
I missing entries are estimated through the dot product
rij ⇡ piqj
5 of 58
Latent factor models (Koren et al., 2009)
6 of 58
Latent factor models - explicit feedback
I discover latent factors (d = 1)
Avatar
(2.24)
The Matrix
(1.92)
Up
(1.18)
Marco
(1.98) ?
4
(3.8)
2
(2.3)
Luca
(1.21)
3
(2.7)
2
(2.3) ?
Anna
(2.30)
5
(5.2) ?
3
(2.7)
I optimization criterion: minimize squared loss
minimize
P,Q
X
(i,j)2O
(rij pi qj )2
7 of 58
Latent factor models - explicit feedback
I discover latent factors (d = 1)
Avatar
(2.24)
The Matrix
(1.92)
Up
(1.18)
Marco
(1.98) 4.4
4
(3.8)
2
(2.3)
Luca
(1.21)
3
(2.7)
2
(2.3) 1.4
Anna
(2.30)
5
(5.2) 4.4
3
(2.7)
I optimization criterion: minimize squared loss
minimize
P,Q
X
(i,j)2O
(rij pi qj )2
7 of 58
Latent factor models - explicit feedback
I discover latent factors (d = 1)
Avatar
(2.24)
The Matrix
(1.92)
Up
(1.18)
Marco
(1.98) 4.4
4
(3.8)
2
(2.3)
Luca
(1.21)
3
(2.7)
2
(2.3) 1.4
Anna
(2.30)
5
(5.2) 4.4
3
(2.7)
I optimization criterion: minimize squared loss
minimize
P,Q
X
(i,j)2O
(rij bµ bui
bxj
pi qj )2
+ (||P||F + ||Q||F )
7 of 58
Latent factor models - implicit feedback
I discover latent factors (d = 1)
Avatar The Matrix Up
Marco ? 1 1
Luca 1 1 ?
Anna 1 ? 1
I the squared loss minimization criterion is not effective!
8 of 58
Latent factor models - implicit feedback
I discover latent factors (d = 1)
Avatar
(1)
The Matrix
(1)
Up
(1)
Marco
(1) 1
1
(1)
1
(1)
Luca
(1)
1
(1)
1
(1) 1
Anna
(1)
1
(1) 1
1
(1)
I the squared loss minimization criterion is not effective!
I the system simply complete the matrix with all 1s
8 of 58
Bayesian personalized ranking
I problem related with ranking more than prediction
B e.g., ranked list of items that the user might like the most
I the BPR criterion adopts a pairwise approach
B predict whether item xj is more probable to be liked than xk
B assumption: the user prefers items for which a feedback exists
DT := {(ui , xj , xk)|ui 2 U ^ xj , xk 2 I, j 6= k ^ rij = 1 ^ rik =?}
I user ui is assumed to prefer item xj over xk
maximize
P,Q
X
(ui ,xj ,xk )2DT

ln (pi qj pi qk)
9 of 58
Stochastic gradient descent
I parameters ⇥ = {P, Q}
I find minimum ⇥⇤
of loss
function L, or maximum for
BPR (ascent)
I pick a starting point ⇥0
I iteratively update current
estimations for ⇥
6
7
0 5 10 15 20 25 30
loss(×107
)
iterations
⇥n+1 ⇥n ⌘
@L
@⇥
I learning rate ⌘
I an update for each given training point
10 of 58
Overview matrix completion
regularized squared
loss
Bayesian personalized
ranking
solve
optimization
problem
model
train the
model
P
Q
optimization criteria
stochastic gradient
descent (ascent)
explicit
feedback
implicit
feedback
user latent
factor vectors
item latent
factor vectors
11 of 58
Challenges of matrix completion
(1) scalability: handle large scale data
B 2B purchases on Amazon.com by 200M customers (2014)
B parallel and distributed approaches are essential!
(2) quality: improve prediction performance
B Netflix awarded a 10% improvement with $1M (2006)
B the performance of matrix completion can be boosted by:
I feeding the system with more data
I integrating contextual information in the model
12 of 58
Overview - state-of-the-artscalability
quality
centralizeddistributedparallel
context-awarecontext-agnostic context-awarecontext-agnostic
only positive evidence
single-value revealed entries
positive and negative evidence
multi-value revealed entries
Bayesian Personalized RankingRegularized Squared Loss
Makari et al., 2014
Ahmed et al., 2013
Zhuang et al., 2013
Niu et al., 2011
Koren et al., 2009
Koren, 2008
Shi et al., 2014
Rendle et al., 2011
Riedel et al., 2013
Rendle et al., 2009
Chen et al., 2012
Menon et al., 2011
Makari et al., 2014
Recht et al., 2013
Rendle, 2012
Karatzoglou et al., 2010
Takács et al., 2009
Ricci et al., 2011
13 of 58
Overview - contributionsscalability
quality
centralizeddistributedparallel
context-awarecontext-agnostic context-awarecontext-agnostic
only positive evidence
single-value revealed entries
positive and negative evidence
multi-value revealed entries
Bayesian Personalized RankingRegularized Squared Loss
Petroni et al., 2015a
Petroni et al., 2014
Makari et al., 2014
Ahmed et al., 2013
Zhuang et al., 2013
Niu et al., 2011
Koren et al., 2009
Koren, 2008
Shi et al., 2014
Rendle et al., 2011
Riedel et al., 2013
Rendle et al., 2009
Petroni et al., 2015b
Chen et al., 2012
Menon et al., 2011
Makari et al., 2014
Recht et al., 2013
Rendle, 2012
Karatzoglou et al., 2010
Takács et al., 2009
Ricci et al., 2011
13 of 58
Contents
1. Distributed Matrix Completion
1.1 Distributed Stochastic Gradient Descend
1.2 Input Partitioner
1.3 Evaluation
2. Context-Aware Matrix Completion
2.1 Open Relation Extraction
2.2 Context-Aware Open Relation Extraction
2.3 Evaluation
3. Conclusion and Outlook
14 of 58
Contents
1. Distributed Matrix Completion
1.1 Distributed Stochastic Gradient Descend
1.2 Input Partitioner
1.3 Evaluation
2. Context-Aware Matrix Completion
2.1 Open Relation Extraction
2.2 Context-Aware Open Relation Extraction
2.3 Evaluation
3. Conclusion and Outlook
14 of 58
Problems of parallel and distributed SGD
I divide the training points (SGD updates) among threads
I SGD updates might depend on each other!
i2
u2
u4
thread1 thread2
tr22
r22
r42
r42
I both threads concurrently update the same latent vector
I lock-based approaches adversely affect concurrency
15 of 58
SGD Taxonomy
FPSGD CSGD DSGD-MRJellyfish SSGD
stochastic gradient
descent
HogWild
(Gemulla et al., 2011)(Zhuang et al., 2013) (Teflioudi et al., 2012)(Recht et al., 2013)
distributed
(Makari et al., 2014)(Niu et al., 2011)
parallel
DSGD++ ASGD
(Makari et al., 2014) (Makari et al., 2014)
I parallel SGD is hardly applicable to very large datasets
B the time-to-convergence may be too slow
B the input data may not fit into the main memory
I ASGD has advantages in scalability and efficiency
16 of 58
Asynchronous stochastic gradient descent
I distributed shared-nothing environment (cluster of machines)
R
computing nodes
Q3
P3
Q4
P4
Q2
P2
Q1
P1
1 2 3 4
R1 R4
R3R2
I R is splitted
I vectors are
replicated
I replicas
concurrently
updated
I replicas deviate
inconsistently
I synchronization
17 of 58
Bulk Synchronous Processing Model
computing nodes
1. local
computation
2. communication
3. barrier
synchronization
I currently used by most of the ASGD implementations
18 of 58
Challenges
R
1 2 3 4
R1 R4
R3R2
I 1. workload balance
B ensure that computing nodes are
fed with the same load
B improve efficiency
I 2. minimize communication
B minimize vector replicas
B improve scalability
19 of 58
Graph representation
I the rating matrix describes a graph
x
2 1
b
c
3
a
x
xx
4
d
x
x
xx x
x1
2
3
4
a b c d
I vertices represent users and items
I edges represent training points (e.g., ratings)
20 of 58
Graph representation
I the rating matrix describes a graph
x
x
xx
x
x
xx x
x
2 1
b
c
3
a
4
d
1
2
3
4
a b c d
I find a partitioning of the graph
I assign each part to a different machine
20 of 58
Balanced graph partitioning
I partition G into smaller parts of (ideally) equal size
vertex-cutedge-cut
I a vertex can be cut in multiple ways and span several parts
while a cut edge connects only two parts
I computation steps are associated with edges
I v-cut better on power-law graphs (Gonzalez et al, 2012)
21 of 58
Power-law graphs
I characteristic of real graphs: power-law degree distribution
B most vertices have few connections while a few have many
10
0
101
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
numberofvertices
degree
crawl
Twitter
connection
2010
log-log scale
I the probability that a vertex has degree d is P(d) / d ↵
I ↵ controls the “skewness” of the degree distribution
22 of 58
Balanced Vertex-Cut Graph Partitioning
I v 2 V vertex; e 2 E edge; p 2 P part
I A(v) set of parts where vertex v is replicated
I 1 tolerance to load imbalance
I the size |p| of part p is its edge cardinality
minimize replicas
reduce (1) bandwidth, (2) memory
usage and (3) synchronization
balance the load
efficient usage of available
computing resources
min
1
|V |
X
v2V
|A(v)| s.t. max
p2P
|p| <
|E|
|P|
I the objective function is the replication factor (RF)
B average number of replicas per vertex
23 of 58
Streaming Setting
I input data is a list of edges, consumed in streaming fashion,
requiring only a single pass
partitioning
algorithm
stream of edges
graph
3 handle graphs that don’t fit in the main memory
3 impose minimum overhead in time
3 scalable, easy parallel implementations
7 assignment decision taken cannot be later changed
24 of 58
Streaming algorithms
history-agnostichistory-aware
power-law-awarepower-law-agnostic
Greedy
(Gonzalez et al., 2012)
DBH
(Xie et al., 2014)
Grid
PDS
Hashing
(Jain et al., 2013)
(Jain et al., 2013)
(Gonzalez et al., 2012)
lessreplicas
betterbalance
less replicas
25 of 58
Streaming algorithms - contributions
history-agnostichistory-aware
power-law-awarepower-law-agnostic
Greedy
(Gonzalez et al., 2012)
DBH
(Xie et al., 2014)
Grid
PDS
Hashing
(Jain et al., 2013)
(Jain et al., 2013)
(Gonzalez et al., 2012)
lessreplicas
betterbalance
less replicas
HDRF
(Petroni et al., 2015)
25 of 58
HDRF: High Degree are Replicated First
I favor the replication of high-degree vertices
I the number of high-degree vertices in power-law graphs is
very small
I overall reduction of the replication factor
26 of 58
HDRF: High Degree are Replicated First
I in the context of robustness to network failure
I if few high-degree vertices are removed from a power-law
graph then it is turned into a set of isolated clusters
I focus on the locality of low-degree vertices
26 of 58
The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
vertices not assigned to any part
incoming edge
e
27 of 58
The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
place e in the least loaded part
incoming edge
e
27 of 58
The HDRF Algorithm
case 2
only one vertex has been assigned
vertex without replicas
vertex with replicas
case 1
place e in the least loaded part
incoming edge
e
e
27 of 58
The HDRF Algorithm
case 2
place e in the part
vertex without replicas
vertex with replicas
case 1
place e in the least loaded part
incoming edge
e
e
27 of 58
The HDRF Algorithm
case 2
place e in the part
vertex without replicas
vertex with replicas
case 1
place e in the least loaded part
incoming edge
e
e
case 3
vertices assigned, common part
e
27 of 58
The HDRF Algorithm
case 2
place e in the part
vertex without replicas
vertex with replicas
case 1
place e in the least loaded part
incoming edge
e
e
case 3
place e in the intersection
e
27 of 58
Create Replicas
case 4
empty intersection
e
28 of 58
Create Replicas
case 4
least loaded part in the union
case 4
empty intersection
e
e
standard Greedy solution
28 of 58
Create Replicas
case 4
least loaded part in the union
case 4
empty intersection
e
case 4
replicate vertex with highest degree
e
δ(v1) > δ(v2)
e
standard Greedy solution
HDRF
28 of 58
Experiments - Settings
I standalone partitioner
B VGP, a software package for one-pass vertex-cut balanced
graph partitioning
B measure the performance: replication and balancing
I GraphLab
B HDRF has been integrated in GraphLab PowerGraph 2.2
B measure the impact on the execution time of graph
computation in a distributed graph computing frameworks
I stream of edges in random order
29 of 58
Experiments - Datasets
I real-word
graphs
I synthetic
graphs
1M vertices
60M to 3M edges
Dataset |V | |E|
MovieLens 10M 80.6K 10M
Netflix 497.9K 100.4M
Tencent Weibo 1.4M 140M
twitter-2010 41.7M 1.47B
10
0
10
1
102
103
10
4
10
5
10
1
10
2
10
3
10
4
10
5
numberofvertices
degree
α = 1.8
α = 2.2
α = 2.6
α = 3.0
α = 3.4
α = 4.0
30 of 58
Results - Synthetic Graphs Replication Factor
I 128 parts
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
31 of 58
Results - Synthetic Graphs Replication Factor
I 128 parts
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
less dense
less edges
easyer to partition
31 of 58
Results - Synthetic Graphs Replication Factor
I 128 parts
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
power-law-agnostic
more dense
more edges
difficult to partition
31 of 58
Results - Synthetic Graphs Replication Factor
I 128 parts
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
power-law-aware
31 of 58
Results - Real-Word Graphs Replication Factor
I 133 parts
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
7.9
11.3 11.5
8.0
greedy
2.8
8.1
10.7
5.9
DBH
1.5
7.1
12.9
6.8
HDRF
1.3
4.9
9.1
4.8
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
greedy
DBH
HDRF
32 of 58
Results - Load Relative Standard Deviation
I MovieLens 10M
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
parts
PDS
grid
greedy
DBH
HDRF
hashing
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
parts
PDS
grid
greedy
DBH
HDRF
hashing
33 of 58
Results - Load Relative Standard Deviation
I MovieLens 10M
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
partitions
PDS
grid
greedy
DBH
HDRF
hashing
history-agnostic
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
parts
PDS
grid
greedy
DBH
HDRF
hashing
33 of 58
Results - Load Relative Standard Deviation
I MovieLens 10M
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
partitions
PDS
grid
greedy
DBH
HDRF
hashing
history-aware
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
parts
PDS
grid
greedy
DBH
HDRF
hashing
33 of 58
Results - Graph Algorithm Runtime Speedup
I ASGD algorithm for collaborative filtering on Tencent Weibo
1
1.5
2
2.5
3
32 64 128
speedup
parts
greedy
PDS
1
1.5
2
2.5
3
32 64 128
speedup
parts
greedy
PDS |P|=31,57,133
I the speedup is proportional to both:
B the advantage in replication factor
B the actual network usage of the algorithm
34 of 58
Summary
I HDRF is a simple and remarkably effective one-pass
vertex-cut graph partitioning algorithm
B achieves on average a replication factor
I about 40% smaller than DBH
I more than 50% smaller than greedy
I almost 3⇥ smaller than PDS
I more than 4⇥ smaller than grid
I almost 14⇥ smaller than hashing
I close to optimal load balance
I ASGD execution time is up to 2⇥ faster when using HDRF
I HDRF has been included in GraphLab!
35 of 58
Contents
1. Distributed Matrix Completion
1.1 Distributed Stochastic Gradient Descend
1.2 Input Partitioner
1.3 Evaluation
2. Context-Aware Matrix Completion
2.1 Open Relation Extraction
2.2 Context-Aware Open Relation Extraction
2.3 Evaluation
3. Conclusion and Outlook
36 of 58
Knowledge bases
I New generation algorithms for web information retrieval make
use of a knowledge base (KB) to increase their accuracy
I match the words in a query
to real world entities (e.g., a
person, a location, etc)
I use real world connections
among these entities
I improve the task of providing
the user with the proper
content he was looking for
I KB represented as a graph
37 of 58
Knowledge bases challenges
I large-scale example: LOD contains over 60B facts (sub-rel-obj)
I despite the size, KBs are far from being complete
B 75% people have unknown nationality in Freebase
B 71% people with place of birth attribute missing in Freebase
I data not only incomplete but also uncertain, noisy or false
I challenges:
B fill missing information
B remove incorrect facts
I idea: scan the web extracting new information
38 of 58
Open relation extraction
I open relation extraction is the task of extracting new facts for
a potentially unbounded set of relations from various sources
natural
language
text
knowledge
bases
39 of 58
Input data: facts from natural language text
Enrico Fermi was a
professor in theoretical
physics at Sapienza
University of Rome.
"professor at"(Fermi,Sapienza)
tuple
sub objrel
surface fact
open information
extractor
extract all
facts in textsurface relation
40 of 58
Input data: facts from knowledge bases
natural
language
text
Fermi
Sapienza
employee(Fermi,Sapienza)
KB fact
employee
KB relation
"professor at"(Fermi,Sapienza)
surface fact
41 of 58
Matrix completion for open relation extraction
(Caesar,Rome)
(Fermi,Rome)
(Fermi,Sapienza)
(de Blasio,NY)
1
1
1
1
1
KB relationsurface relation
employeeborn in professor at mayor of
tuples x relations
42 of 58
Matrix completion for open relation extraction
(Caesar,Rome)
(Fermi,Rome)
(Fermi,Sapienza)
(de Blasio,NY)
1
1
1
1
1
KB relationsurface relation
employeeborn in professor at mayor of
tuples x relations
?
? ??
? ??
?
? ? ?
42 of 58
Contributions
(Caesar,Rome)
(Fermi,Rome)
(Fermi,Sapienza)
(de Blasio,NY)
1
1
1
1
1
KB relationsurface relation
employeeborn in professor at mayor of
tuples x relations
?
? ??
? ??
?
? ? ?
we propose CORE (context-aware
open relation extraction) that
integrates contextual information
into such models to improve
prediction performance
43 of 58
Contextual information
Tom Peloso joined
Modest Mouse to record
their fifth studio album.
person
organization
surface relation
"join"(Peloso,Modest Mouse)
unspecific relation
entity types
article topic
words record album
Contextual information
named entity
recognizer
label entity
with coarse-
grained type
44 of 58
CORE - latent representation of variables
I associates latent representations fv with each variable v 2 V
Peloso
(Peloso,Modest Mouse)
Modest Mouse
join
person
organization
record
album
tuple
relation
entities
context
latent
factor
vectors
45 of 58
CORE - modeling facts
1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1
0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0
1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4
0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0
“born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person,
organization
person,
location
physics history
relations tuples entities tuple types tuple topics
x1
x2
x3
x4
Surface KB Context
…
“professor at”(x,y)
I models the input data in terms of a matrix in which each row
corresponds to a fact x and each column to a variable v
I groups columns according to the type of the variables
I in each row the values of each column group sum up to unity
46 of 58
CORE - modeling context
1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1
0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0
1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4
0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0
“born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person,
organization
person,
location
physics history
relations tuples entities tuple types tuple topics
x1
x2
x3
x4
Surface KB Context
…
“professor at”(x,y)
I aggregates and normalizes contextual information by tuple
B a fact can be observed multiple times with different context
B there is no context for new facts (never observed in input)
I this approach allows us to provide comprehensive contextual
information for both observed and unobserved facts
47 of 58
CORE - factorization model
1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1
0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0
1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4
0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0
“born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person,
organization
person,
location
physics history
relations tuples entities tuple types tuple topics
x1
x2
x3
x4
Surface KB Context
…
“professor at”(x,y)
I uses factorization machines as underlying framework
I associates a score s(x) with each fact x
s(x) =
X
v12V
X
v22V {v1}
xv1 xv2 f T
v1
fv2
I weighted pairwise interactions of latent factor vectors
48 of 58
CORE - parameter estimation
I parameters: ⇥ = {fv | v 2 V }
I Bayesian personalized ranking, all observations are positive
I goal: produce a ranked list of tuples for each relation
professor at
location
tuple entities tuple contextfact
(Fermi,Sapienza)
Fermi
Sapienza organization
physics
person
x
(Caesar,Rome)
professor at Caesar
locationRome
history
tuple entities tuple contextfact
person
x-sampled negative evidenceobserved fact
I pairwise approach, x is more likely to be true than x-
maximize
X
x
f (s(x) s(x-))
I stochastic gradient ascent
⇥ ⇥ + ⌘r⇥
⇣ ⌘
49 of 58
Experiments - dataset
440k facts
extracted from
corpus
15k facts
from
entity mentions
linked using
string matching
I Contextual information
article metadata bag-of-word
sentences where
the fact has been
extracted
entity type
person
organization
location
miscellaneous
news desk (e.g., foreign desk)
descriptors (e.g., finances)
online section (e.g., sports)
section (e.g., a, d)
publication year m t w
I letters to indicate contextual information considered
50 of 58
Experiments - methodology
I we consider (to keep experiments feasible):
10k tuples 10 surface relations19 Freebase relations
I for each relation and method:
B we rank the tuples subsample
B we consider the top-100 predictions and label them manually
I evaluation metrics:
number of true facts MAP (quality of the ranking)
I methods:
B PITF, tensor factorization method (designed to work in-KB)
B NFE, matrix completion method (best context-agnostic)
B CORE, uses relations, tuples and entities as variables
B CORE+m, +t, +w, +mt, +mtw
51 of 58
Results - Freebase relations
Relation # PITF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw
person/company 208 70 (0.47) 92 (0.81) 91 (0.83) 90 (0.84) 91 (0.87) 92 (0.87) 95 (0.93) 96 (0.94)
person/place_of_birth 117 1 (0.0) 92 (0.9) 90 (0.88) 92 (0.9) 92 (0.9) 89 (0.87) 93 (0.9) 92 (0.9)
location/containedby 102 7 (0.0) 63 (0.47) 62 (0.47) 63 (0.46) 61 (0.47) 61 (0.44) 62 (0.49) 68 (0.55)
parent/child 88 9 (0.01) 64 (0.6) 64 (0.56) 64 (0.59) 64 (0.62) 64 (0.57) 67 (0.67) 68 (0.63)
person/place_of_death 71 1 (0.0) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.92)
person/parents 67 20 (0.1) 51 (0.64) 52 (0.62) 51 (0.61) 49 (0.64) 47 (0.6) 53 (0.67) 53 (0.65)
author/works_written 65 24 (0.08) 45 (0.59) 49 (0.62) 51 (0.69) 50 (0.68) 50 (0.68) 51 (0.7) 52 (0.67)
person/nationality 61 21 (0.08) 25 (0.19) 27 (0.17) 28 (0.2) 26 (0.2) 29 (0.19) 27 (0.18) 27 (0.21)
neighbor./neighborhood_of 39 3 (0.0) 24 (0.44) 23 (0.45) 26 (0.5) 27 (0.47) 27 (0.49) 30 (0.51) 30 (0.52)
...
Average MAP100
# 0.09 0.46 0.47 0.49 0.47 0.49 0.49 0.51
Weighted Average MAP100
# 0.14 0.64 0.64 0.66 0.67 0.66 0.70 0.70
0.5 0.55 0.6 0.65 0.7
NFE
CORE
CORE+m
CORE+t
CORE+w
CORE+mt
CORE+mtw
Weighted Average MAP
0.64
0.64
0.66
0.67
0.66
0.70
0.70
52 of 58
Results - surface relations
Relation # PITF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw
head 162 34 (0.18) 80 (0.66) 83 (0.66) 82 (0.63) 76 (0.57) 77 (0.57) 83 (0.69) 88 (0.73)
scientist 144 44 (0.17) 76 (0.6) 74 (0.55) 73 (0.56) 74 (0.6) 73 (0.59) 78 (0.66) 78 (0.69)
base 133 10 (0.01) 85 (0.71) 86 (0.71) 86 (0.78) 88 (0.79) 85 (0.75) 83 (0.76) 89 (0.8)
visit 118 4 (0.0) 73 (0.6) 75 (0.61) 76 (0.64) 80 (0.68) 74 (0.64) 75 (0.66) 82 (0.74)
attend 92 11 (0.02) 65 (0.58) 64 (0.59) 65 (0.63) 62 (0.6) 66 (0.63) 62 (0.58) 69 (0.64)
adviser 56 2 (0.0) 42 (0.56) 47 (0.58) 44 (0.58) 43 (0.59) 45 (0.63) 43 (0.53) 44 (0.63)
criticize 40 5 (0.0) 31 (0.66) 33 (0.62) 33 (0.7) 33 (0.67) 33 (0.61) 35 (0.69) 37 (0.69)
support 33 3 (0.0) 19 (0.27) 22 (0.28) 18 (0.21) 19 (0.28) 22 (0.27) 23 (0.27) 21 (0.27)
praise 5 0 (0.0) 2 (0.0) 2 (0.01) 4 (0.03) 3 (0.01) 3 (0.02) 5 (0.03) 2 (0.01)
vote 3 2 (0.01) 3 (0.63) 3 (0.63) 3 (0.32) 3 (0.49) 3 (0.51) 3 (0.59) 3 (0.64)
Average MAP100
# 0.04 0.53 0.53 0.51 0.53 0.53 0.55 0.59
Weighted Average MAP100
# 0.08 0.62 0.61 0.63 0.63 0.61 0.65 0.70
0.5 0.6 0.7
NFE
CORE
CORE+m
CORE+t
CORE+w
CORE+mt
CORE+mtw
Weighted Average MAP
0.62
0.61
0.63
0.63
0.61
0.65
0.70
53 of 58
Anecdotal results
author(x,y)
ranked list of tuples
1 (Winston Groom, Forrest Gump)
2 (D. M. Thomas, White Hotel)
3 (Roger Rosenblatt, Life Itself)
4 (Edmund White, Skinned Alive)
5 (Peter Manso, Brando: The Biography)
similar relations
0.98 “reviews x by y”(x,y)
0.97 “book by”(x,y)
0.95 “author of”(x,y)
0.95 ” ‘s novel”(x,y)
0.95 “ ‘s book”(x,y)
similar relations
0.87 “scientist”(x,y)
0.84 “scientist with”(x,y)
0.80 “professor at”(x,y)
0.79 “scientist for”(x,y)
0.78 “neuroscientist at”(x,y)
ranked list of tuples
1 (Riordan Roett, Johns Hopkins University)
2 (Dr. R. M. Roberts, University of Missouri)
3 (Linda Mayes, Yale University)
4 (Daniel T. Jones, Cardiff Business School)
5 (Russell Ross, University of Iowa)
“scientist at”(x,y)
I semantic similarity of relations is one aspect of our model
I similar relations treated differently in different contexts
54 of 58
Contents
1. Distributed Matrix Completion
1.1 Distributed Stochastic Gradient Descend
1.2 Input Partitioner
1.3 Evaluation
2. Context-Aware Matrix Completion
2.1 Open Relation Extraction
2.2 Context-Aware Open Relation Extraction
2.3 Evaluation
3. Conclusion and Outlook
55 of 58
Conclusion
I we tackle the scalability and the quality of MC
I we investigate graph partitioning techniques for ASGD
I we propose HDRF, one-pass v-cut graph partitioning alg
B exploit power-law nature of real-word graphs
B provides minimum replicas with close to optimal load balance
B significantly reduces the time needed to perform computation
I we propose CORE, a matrix completion model for open
relation extraction that incorporates contextual information
B based on factorization machines and BPR
B extensible model, additional information can be integrated
B exploiting context significantly improve prediction quality
I all code released https://github.com/fabiopetroni
56 of 58
Overview - Future workscalability
quality
centralizeddistributedparallel
context-awarecontext-agnostic context-awarecontext-agnostic
only positive evidence
single-value revealed entries
positive and negative evidence
multi-value revealed entries
Bayesian Personalized RankingRegularized Squared Loss
Petroni et al., 2015a
Petroni et al., 2014
Makari et al., 2014
Ahmed et al., 2013
Zhuang et al., 2013
Niu et al., 2011
Koren et al., 2009
Koren, 2008
Shi et al., 2014
Rendle et al., 2011
Riedel et al., 2013
Rendle et al., 2009
Petroni et al., 2015b
Chen et al., 2012
Menon et al., 2011
Makari et al., 2014
Recht et al., 2013
Rendle, 2012
Karatzoglou et al., 2010
Takács et al., 2009
Ricci et al., 2011
56 of 58
Future Directions
I distribute training for context-aware matrix completion
B asynchronous approach: local vector copies, synchronization
B challenging input placement, the input describes an hypergraph
I adaptation of the HDRF
algorithm to hypergraphs
B do high degree vertices play a
crucial role?
“born in”(x,y)
Caesar,Rome
Fermi,Rome
Caesar
Rome
Fermi
person, location
physics
history
e1
e2
I distributed Bayesian personalized ranking
B sampling of a negative counterpart for each training point
B sampling from the local portion of the dataset in current node?
57 of 58
Thank you!
Questions?
Fabio Petroni
Sapienza University of Rome, Italy
58 of 58

Mais conteúdo relacionado

Mais procurados

Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageIRJET Journal
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classificationBill Kromydas
 
Size measurement and estimation
Size measurement and estimationSize measurement and estimation
Size measurement and estimationLouis A. Poulin
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520IJRAT
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course ProjectKhalilBergaoui
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest pathredhatdb
 
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...ijcsit
 

Mais procurados (20)

Fourier Transform Assignment Help
Fourier Transform Assignment HelpFourier Transform Assignment Help
Fourier Transform Assignment Help
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
03 hdrf presentation
03 hdrf presentation03 hdrf presentation
03 hdrf presentation
 
Unit 3
Unit 3Unit 3
Unit 3
 
Rn d presentation_gurulingannk
Rn d presentation_gurulingannkRn d presentation_gurulingannk
Rn d presentation_gurulingannk
 
Computation Assignment Help
Computation Assignment Help Computation Assignment Help
Computation Assignment Help
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classification
 
Size measurement and estimation
Size measurement and estimationSize measurement and estimation
Size measurement and estimation
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course Project
 
Slide1
Slide1Slide1
Slide1
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest path
 
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...
 

Destaque

HSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe systemHSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe systemFabio Petroni, PhD
 
CORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization MachinesCORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization MachinesFabio Petroni, PhD
 
Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...ActiveEon
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadFarhad Gholami
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 

Destaque (7)

HSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe systemHSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe system
 
CORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization MachinesCORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization Machines
 
Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...Comparison of Matrix Completion Algorithms for Background Initialization in V...
Comparison of Matrix Completion Algorithms for Background Initialization in V...
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhad
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 

Semelhante a Mining at scale with latent factor models for matrix completion

Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsEellekwameowusu
 
new optimization algorithm for topology optimization
new optimization algorithm for topology optimizationnew optimization algorithm for topology optimization
new optimization algorithm for topology optimizationSeonho Park
 
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...Dongsun Kim
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Thoma Itoh
 
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...IJERA Editor
 
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...IJERA Editor
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Enhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points AnalysisEnhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points Analysisjfrchicanog
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenderssscdotopen
 
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 DownloadAerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 DownloadRakesh Bhupathi
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Pi gate-2018-p (gate2016.info)
Pi gate-2018-p (gate2016.info)Pi gate-2018-p (gate2016.info)
Pi gate-2018-p (gate2016.info)DrRojaAbrahamRaju
 
Digital signal and image processing FAQ
Digital signal and image processing FAQDigital signal and image processing FAQ
Digital signal and image processing FAQMukesh Tekwani
 
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...nasitravi41
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET Journal
 

Semelhante a Mining at scale with latent factor models for matrix completion (20)

Final Report
Final ReportFinal Report
Final Report
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 
new optimization algorithm for topology optimization
new optimization algorithm for topology optimizationnew optimization algorithm for topology optimization
new optimization algorithm for topology optimization
 
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...
Good Hunting: Locating, Prioritizing, and Fixing Bugs Automatically (Keynote,...
 
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
Friedlander et al. Evolution of Bow-Tie Architectures in Biology (2015)
 
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
 
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
Determination of Optimal Product Mix for Profit Maximization using Linear Pro...
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Enhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points AnalysisEnhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points Analysis
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenders
 
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 DownloadAerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
Aerospace Engineering (AE) - Gate Previous Question Paper 2011 Download
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Pi gate-2018-p (gate2016.info)
Pi gate-2018-p (gate2016.info)Pi gate-2018-p (gate2016.info)
Pi gate-2018-p (gate2016.info)
 
Lecture2a algorithm
Lecture2a algorithmLecture2a algorithm
Lecture2a algorithm
 
Gate-Cs 1993
Gate-Cs 1993Gate-Cs 1993
Gate-Cs 1993
 
Digital signal and image processing FAQ
Digital signal and image processing FAQDigital signal and image processing FAQ
Digital signal and image processing FAQ
 
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...
Gate 2016-2018-production-and-industrial-engineering-question-paper-and-answe...
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
 

Último

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 

Último (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 

Mining at scale with latent factor models for matrix completion

  • 1. Mining at scale with latent factor models for matrix completion Fabio Petroni
  • 2. Recommender systems Marco wants to watch a movie. A B C D E F G H I J K ... Z A (1998) A-ge-man (1990) A.K.G. (2007) A Nous Amours (1983) ... many pages later ... Azumi (2003) DVD rental store 2 of 58
  • 3. Recommender systems Marco wants to watch a movie. A B C D E F G H I J K ... Z A (1998) A-ge-man (1990) A.K.G. (2007) A Nous Amours (1983) ... many pages later ... Azumi (2003) DVD rental store But there are so many movies! Which ones will he like? 2 of 58
  • 4. Collaborative filtering I problem B set of users B set of items (movies, books, songs, ...) B feedback I explicit (ratings, ...) I implicit (purchase, click-through, ...) I predict the preference of each user for each item B assumption: similar feedback $ similar taste I example (explicit feedback): Avatar The Matrix Up Marco 4 2 Luca 3 2 Anna 5 3 3 of 58
  • 5. Collaborative filtering I problem B set of users B set of items (movies, books, songs, ...) B feedback I explicit (ratings, ...) I implicit (purchase, click-through, ...) I predict the preference of each user for each item B assumption: similar feedback $ similar taste I example (explicit feedback): Avatar The Matrix Up Marco ? 4 2 Luca 3 2 ? Anna 5 ? 3 3 of 58
  • 6. Collaborative filtering taxonomy SVD PMFuser based PLS(A/I) memory based collaborative filtering item based model based probabilistic methods neighborhood models dimensionality reduction matrix completion latent Dirichlet allocation other machine learning methods Bayesian networks Markov decision processes neural networks I matrix completion is currently considered the best approach I advantages with respect to both scalability and accuracy 4 of 58
  • 7. Matrix completion for collaborative filtering I the completion is driven by a factorization R P Q I associate a latent factor vector with each user and each item I missing entries are estimated through the dot product rij ⇡ piqj 5 of 58
  • 8. Latent factor models (Koren et al., 2009) 6 of 58
  • 9. Latent factor models - explicit feedback I discover latent factors (d = 1) Avatar (2.24) The Matrix (1.92) Up (1.18) Marco (1.98) ? 4 (3.8) 2 (2.3) Luca (1.21) 3 (2.7) 2 (2.3) ? Anna (2.30) 5 (5.2) ? 3 (2.7) I optimization criterion: minimize squared loss minimize P,Q X (i,j)2O (rij pi qj )2 7 of 58
  • 10. Latent factor models - explicit feedback I discover latent factors (d = 1) Avatar (2.24) The Matrix (1.92) Up (1.18) Marco (1.98) 4.4 4 (3.8) 2 (2.3) Luca (1.21) 3 (2.7) 2 (2.3) 1.4 Anna (2.30) 5 (5.2) 4.4 3 (2.7) I optimization criterion: minimize squared loss minimize P,Q X (i,j)2O (rij pi qj )2 7 of 58
  • 11. Latent factor models - explicit feedback I discover latent factors (d = 1) Avatar (2.24) The Matrix (1.92) Up (1.18) Marco (1.98) 4.4 4 (3.8) 2 (2.3) Luca (1.21) 3 (2.7) 2 (2.3) 1.4 Anna (2.30) 5 (5.2) 4.4 3 (2.7) I optimization criterion: minimize squared loss minimize P,Q X (i,j)2O (rij bµ bui bxj pi qj )2 + (||P||F + ||Q||F ) 7 of 58
  • 12. Latent factor models - implicit feedback I discover latent factors (d = 1) Avatar The Matrix Up Marco ? 1 1 Luca 1 1 ? Anna 1 ? 1 I the squared loss minimization criterion is not effective! 8 of 58
  • 13. Latent factor models - implicit feedback I discover latent factors (d = 1) Avatar (1) The Matrix (1) Up (1) Marco (1) 1 1 (1) 1 (1) Luca (1) 1 (1) 1 (1) 1 Anna (1) 1 (1) 1 1 (1) I the squared loss minimization criterion is not effective! I the system simply complete the matrix with all 1s 8 of 58
  • 14. Bayesian personalized ranking I problem related with ranking more than prediction B e.g., ranked list of items that the user might like the most I the BPR criterion adopts a pairwise approach B predict whether item xj is more probable to be liked than xk B assumption: the user prefers items for which a feedback exists DT := {(ui , xj , xk)|ui 2 U ^ xj , xk 2 I, j 6= k ^ rij = 1 ^ rik =?} I user ui is assumed to prefer item xj over xk maximize P,Q X (ui ,xj ,xk )2DT  ln (pi qj pi qk) 9 of 58
  • 15. Stochastic gradient descent I parameters ⇥ = {P, Q} I find minimum ⇥⇤ of loss function L, or maximum for BPR (ascent) I pick a starting point ⇥0 I iteratively update current estimations for ⇥ 6 7 0 5 10 15 20 25 30 loss(×107 ) iterations ⇥n+1 ⇥n ⌘ @L @⇥ I learning rate ⌘ I an update for each given training point 10 of 58
  • 16. Overview matrix completion regularized squared loss Bayesian personalized ranking solve optimization problem model train the model P Q optimization criteria stochastic gradient descent (ascent) explicit feedback implicit feedback user latent factor vectors item latent factor vectors 11 of 58
  • 17. Challenges of matrix completion (1) scalability: handle large scale data B 2B purchases on Amazon.com by 200M customers (2014) B parallel and distributed approaches are essential! (2) quality: improve prediction performance B Netflix awarded a 10% improvement with $1M (2006) B the performance of matrix completion can be boosted by: I feeding the system with more data I integrating contextual information in the model 12 of 58
  • 18. Overview - state-of-the-artscalability quality centralizeddistributedparallel context-awarecontext-agnostic context-awarecontext-agnostic only positive evidence single-value revealed entries positive and negative evidence multi-value revealed entries Bayesian Personalized RankingRegularized Squared Loss Makari et al., 2014 Ahmed et al., 2013 Zhuang et al., 2013 Niu et al., 2011 Koren et al., 2009 Koren, 2008 Shi et al., 2014 Rendle et al., 2011 Riedel et al., 2013 Rendle et al., 2009 Chen et al., 2012 Menon et al., 2011 Makari et al., 2014 Recht et al., 2013 Rendle, 2012 Karatzoglou et al., 2010 Takács et al., 2009 Ricci et al., 2011 13 of 58
  • 19. Overview - contributionsscalability quality centralizeddistributedparallel context-awarecontext-agnostic context-awarecontext-agnostic only positive evidence single-value revealed entries positive and negative evidence multi-value revealed entries Bayesian Personalized RankingRegularized Squared Loss Petroni et al., 2015a Petroni et al., 2014 Makari et al., 2014 Ahmed et al., 2013 Zhuang et al., 2013 Niu et al., 2011 Koren et al., 2009 Koren, 2008 Shi et al., 2014 Rendle et al., 2011 Riedel et al., 2013 Rendle et al., 2009 Petroni et al., 2015b Chen et al., 2012 Menon et al., 2011 Makari et al., 2014 Recht et al., 2013 Rendle, 2012 Karatzoglou et al., 2010 Takács et al., 2009 Ricci et al., 2011 13 of 58
  • 20. Contents 1. Distributed Matrix Completion 1.1 Distributed Stochastic Gradient Descend 1.2 Input Partitioner 1.3 Evaluation 2. Context-Aware Matrix Completion 2.1 Open Relation Extraction 2.2 Context-Aware Open Relation Extraction 2.3 Evaluation 3. Conclusion and Outlook 14 of 58
  • 21. Contents 1. Distributed Matrix Completion 1.1 Distributed Stochastic Gradient Descend 1.2 Input Partitioner 1.3 Evaluation 2. Context-Aware Matrix Completion 2.1 Open Relation Extraction 2.2 Context-Aware Open Relation Extraction 2.3 Evaluation 3. Conclusion and Outlook 14 of 58
  • 22. Problems of parallel and distributed SGD I divide the training points (SGD updates) among threads I SGD updates might depend on each other! i2 u2 u4 thread1 thread2 tr22 r22 r42 r42 I both threads concurrently update the same latent vector I lock-based approaches adversely affect concurrency 15 of 58
  • 23. SGD Taxonomy FPSGD CSGD DSGD-MRJellyfish SSGD stochastic gradient descent HogWild (Gemulla et al., 2011)(Zhuang et al., 2013) (Teflioudi et al., 2012)(Recht et al., 2013) distributed (Makari et al., 2014)(Niu et al., 2011) parallel DSGD++ ASGD (Makari et al., 2014) (Makari et al., 2014) I parallel SGD is hardly applicable to very large datasets B the time-to-convergence may be too slow B the input data may not fit into the main memory I ASGD has advantages in scalability and efficiency 16 of 58
  • 24. Asynchronous stochastic gradient descent I distributed shared-nothing environment (cluster of machines) R computing nodes Q3 P3 Q4 P4 Q2 P2 Q1 P1 1 2 3 4 R1 R4 R3R2 I R is splitted I vectors are replicated I replicas concurrently updated I replicas deviate inconsistently I synchronization 17 of 58
  • 25. Bulk Synchronous Processing Model computing nodes 1. local computation 2. communication 3. barrier synchronization I currently used by most of the ASGD implementations 18 of 58
  • 26. Challenges R 1 2 3 4 R1 R4 R3R2 I 1. workload balance B ensure that computing nodes are fed with the same load B improve efficiency I 2. minimize communication B minimize vector replicas B improve scalability 19 of 58
  • 27. Graph representation I the rating matrix describes a graph x 2 1 b c 3 a x xx 4 d x x xx x x1 2 3 4 a b c d I vertices represent users and items I edges represent training points (e.g., ratings) 20 of 58
  • 28. Graph representation I the rating matrix describes a graph x x xx x x xx x x 2 1 b c 3 a 4 d 1 2 3 4 a b c d I find a partitioning of the graph I assign each part to a different machine 20 of 58
  • 29. Balanced graph partitioning I partition G into smaller parts of (ideally) equal size vertex-cutedge-cut I a vertex can be cut in multiple ways and span several parts while a cut edge connects only two parts I computation steps are associated with edges I v-cut better on power-law graphs (Gonzalez et al, 2012) 21 of 58
  • 30. Power-law graphs I characteristic of real graphs: power-law degree distribution B most vertices have few connections while a few have many 10 0 101 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 numberofvertices degree crawl Twitter connection 2010 log-log scale I the probability that a vertex has degree d is P(d) / d ↵ I ↵ controls the “skewness” of the degree distribution 22 of 58
  • 31. Balanced Vertex-Cut Graph Partitioning I v 2 V vertex; e 2 E edge; p 2 P part I A(v) set of parts where vertex v is replicated I 1 tolerance to load imbalance I the size |p| of part p is its edge cardinality minimize replicas reduce (1) bandwidth, (2) memory usage and (3) synchronization balance the load efficient usage of available computing resources min 1 |V | X v2V |A(v)| s.t. max p2P |p| < |E| |P| I the objective function is the replication factor (RF) B average number of replicas per vertex 23 of 58
  • 32. Streaming Setting I input data is a list of edges, consumed in streaming fashion, requiring only a single pass partitioning algorithm stream of edges graph 3 handle graphs that don’t fit in the main memory 3 impose minimum overhead in time 3 scalable, easy parallel implementations 7 assignment decision taken cannot be later changed 24 of 58
  • 33. Streaming algorithms history-agnostichistory-aware power-law-awarepower-law-agnostic Greedy (Gonzalez et al., 2012) DBH (Xie et al., 2014) Grid PDS Hashing (Jain et al., 2013) (Jain et al., 2013) (Gonzalez et al., 2012) lessreplicas betterbalance less replicas 25 of 58
  • 34. Streaming algorithms - contributions history-agnostichistory-aware power-law-awarepower-law-agnostic Greedy (Gonzalez et al., 2012) DBH (Xie et al., 2014) Grid PDS Hashing (Jain et al., 2013) (Jain et al., 2013) (Gonzalez et al., 2012) lessreplicas betterbalance less replicas HDRF (Petroni et al., 2015) 25 of 58
  • 35. HDRF: High Degree are Replicated First I favor the replication of high-degree vertices I the number of high-degree vertices in power-law graphs is very small I overall reduction of the replication factor 26 of 58
  • 36. HDRF: High Degree are Replicated First I in the context of robustness to network failure I if few high-degree vertices are removed from a power-law graph then it is turned into a set of isolated clusters I focus on the locality of low-degree vertices 26 of 58
  • 37. The HDRF Algorithm vertex without replicas vertex with replicas case 1 vertices not assigned to any part incoming edge e 27 of 58
  • 38. The HDRF Algorithm vertex without replicas vertex with replicas case 1 place e in the least loaded part incoming edge e 27 of 58
  • 39. The HDRF Algorithm case 2 only one vertex has been assigned vertex without replicas vertex with replicas case 1 place e in the least loaded part incoming edge e e 27 of 58
  • 40. The HDRF Algorithm case 2 place e in the part vertex without replicas vertex with replicas case 1 place e in the least loaded part incoming edge e e 27 of 58
  • 41. The HDRF Algorithm case 2 place e in the part vertex without replicas vertex with replicas case 1 place e in the least loaded part incoming edge e e case 3 vertices assigned, common part e 27 of 58
  • 42. The HDRF Algorithm case 2 place e in the part vertex without replicas vertex with replicas case 1 place e in the least loaded part incoming edge e e case 3 place e in the intersection e 27 of 58
  • 43. Create Replicas case 4 empty intersection e 28 of 58
  • 44. Create Replicas case 4 least loaded part in the union case 4 empty intersection e e standard Greedy solution 28 of 58
  • 45. Create Replicas case 4 least loaded part in the union case 4 empty intersection e case 4 replicate vertex with highest degree e δ(v1) > δ(v2) e standard Greedy solution HDRF 28 of 58
  • 46. Experiments - Settings I standalone partitioner B VGP, a software package for one-pass vertex-cut balanced graph partitioning B measure the performance: replication and balancing I GraphLab B HDRF has been integrated in GraphLab PowerGraph 2.2 B measure the impact on the execution time of graph computation in a distributed graph computing frameworks I stream of edges in random order 29 of 58
  • 47. Experiments - Datasets I real-word graphs I synthetic graphs 1M vertices 60M to 3M edges Dataset |V | |E| MovieLens 10M 80.6K 10M Netflix 497.9K 100.4M Tencent Weibo 1.4M 140M twitter-2010 41.7M 1.47B 10 0 10 1 102 103 10 4 10 5 10 1 10 2 10 3 10 4 10 5 numberofvertices degree α = 1.8 α = 2.2 α = 2.6 α = 3.0 α = 3.4 α = 4.0 30 of 58
  • 48. Results - Synthetic Graphs Replication Factor I 128 parts 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous 31 of 58
  • 49. Results - Synthetic Graphs Replication Factor I 128 parts 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous less dense less edges easyer to partition 31 of 58
  • 50. Results - Synthetic Graphs Replication Factor I 128 parts 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous power-law-agnostic more dense more edges difficult to partition 31 of 58
  • 51. Results - Synthetic Graphs Replication Factor I 128 parts 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous power-law-aware 31 of 58
  • 52. Results - Real-Word Graphs Replication Factor I 133 parts 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS 7.9 11.3 11.5 8.0 greedy 2.8 8.1 10.7 5.9 DBH 1.5 7.1 12.9 6.8 HDRF 1.3 4.9 9.1 4.8 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS greedy DBH HDRF 32 of 58
  • 53. Results - Load Relative Standard Deviation I MovieLens 10M 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) parts PDS grid greedy DBH HDRF hashing 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) parts PDS grid greedy DBH HDRF hashing 33 of 58
  • 54. Results - Load Relative Standard Deviation I MovieLens 10M 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) partitions PDS grid greedy DBH HDRF hashing history-agnostic 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) parts PDS grid greedy DBH HDRF hashing 33 of 58
  • 55. Results - Load Relative Standard Deviation I MovieLens 10M 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) partitions PDS grid greedy DBH HDRF hashing history-aware 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) parts PDS grid greedy DBH HDRF hashing 33 of 58
  • 56. Results - Graph Algorithm Runtime Speedup I ASGD algorithm for collaborative filtering on Tencent Weibo 1 1.5 2 2.5 3 32 64 128 speedup parts greedy PDS 1 1.5 2 2.5 3 32 64 128 speedup parts greedy PDS |P|=31,57,133 I the speedup is proportional to both: B the advantage in replication factor B the actual network usage of the algorithm 34 of 58
  • 57. Summary I HDRF is a simple and remarkably effective one-pass vertex-cut graph partitioning algorithm B achieves on average a replication factor I about 40% smaller than DBH I more than 50% smaller than greedy I almost 3⇥ smaller than PDS I more than 4⇥ smaller than grid I almost 14⇥ smaller than hashing I close to optimal load balance I ASGD execution time is up to 2⇥ faster when using HDRF I HDRF has been included in GraphLab! 35 of 58
  • 58. Contents 1. Distributed Matrix Completion 1.1 Distributed Stochastic Gradient Descend 1.2 Input Partitioner 1.3 Evaluation 2. Context-Aware Matrix Completion 2.1 Open Relation Extraction 2.2 Context-Aware Open Relation Extraction 2.3 Evaluation 3. Conclusion and Outlook 36 of 58
  • 59. Knowledge bases I New generation algorithms for web information retrieval make use of a knowledge base (KB) to increase their accuracy I match the words in a query to real world entities (e.g., a person, a location, etc) I use real world connections among these entities I improve the task of providing the user with the proper content he was looking for I KB represented as a graph 37 of 58
  • 60. Knowledge bases challenges I large-scale example: LOD contains over 60B facts (sub-rel-obj) I despite the size, KBs are far from being complete B 75% people have unknown nationality in Freebase B 71% people with place of birth attribute missing in Freebase I data not only incomplete but also uncertain, noisy or false I challenges: B fill missing information B remove incorrect facts I idea: scan the web extracting new information 38 of 58
  • 61. Open relation extraction I open relation extraction is the task of extracting new facts for a potentially unbounded set of relations from various sources natural language text knowledge bases 39 of 58
  • 62. Input data: facts from natural language text Enrico Fermi was a professor in theoretical physics at Sapienza University of Rome. "professor at"(Fermi,Sapienza) tuple sub objrel surface fact open information extractor extract all facts in textsurface relation 40 of 58
  • 63. Input data: facts from knowledge bases natural language text Fermi Sapienza employee(Fermi,Sapienza) KB fact employee KB relation "professor at"(Fermi,Sapienza) surface fact 41 of 58
  • 64. Matrix completion for open relation extraction (Caesar,Rome) (Fermi,Rome) (Fermi,Sapienza) (de Blasio,NY) 1 1 1 1 1 KB relationsurface relation employeeborn in professor at mayor of tuples x relations 42 of 58
  • 65. Matrix completion for open relation extraction (Caesar,Rome) (Fermi,Rome) (Fermi,Sapienza) (de Blasio,NY) 1 1 1 1 1 KB relationsurface relation employeeborn in professor at mayor of tuples x relations ? ? ?? ? ?? ? ? ? ? 42 of 58
  • 66. Contributions (Caesar,Rome) (Fermi,Rome) (Fermi,Sapienza) (de Blasio,NY) 1 1 1 1 1 KB relationsurface relation employeeborn in professor at mayor of tuples x relations ? ? ?? ? ?? ? ? ? ? we propose CORE (context-aware open relation extraction) that integrates contextual information into such models to improve prediction performance 43 of 58
  • 67. Contextual information Tom Peloso joined Modest Mouse to record their fifth studio album. person organization surface relation "join"(Peloso,Modest Mouse) unspecific relation entity types article topic words record album Contextual information named entity recognizer label entity with coarse- grained type 44 of 58
  • 68. CORE - latent representation of variables I associates latent representations fv with each variable v 2 V Peloso (Peloso,Modest Mouse) Modest Mouse join person organization record album tuple relation entities context latent factor vectors 45 of 58
  • 69. CORE - modeling facts 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, organization person, location physics history relations tuples entities tuple types tuple topics x1 x2 x3 x4 Surface KB Context … “professor at”(x,y) I models the input data in terms of a matrix in which each row corresponds to a fact x and each column to a variable v I groups columns according to the type of the variables I in each row the values of each column group sum up to unity 46 of 58
  • 70. CORE - modeling context 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, organization person, location physics history relations tuples entities tuple types tuple topics x1 x2 x3 x4 Surface KB Context … “professor at”(x,y) I aggregates and normalizes contextual information by tuple B a fact can be observed multiple times with different context B there is no context for new facts (never observed in input) I this approach allows us to provide comprehensive contextual information for both observed and unobserved facts 47 of 58
  • 71. CORE - factorization model 1 0 0 1 0 0 0.5 0.5 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 1 0 1 0 1 0 0 0 1 0 0 0.5 0.5 0 0 1 0.6 0.4 0 0 1 0 0 1 0 0 0.5 0.5 1 0 1 0 “born in”(x,y) employee(x,y) Caesar,Rome Fermi,Rome Fermi,Sapienza Caesar Rome Fermi Sapienza person, organization person, location physics history relations tuples entities tuple types tuple topics x1 x2 x3 x4 Surface KB Context … “professor at”(x,y) I uses factorization machines as underlying framework I associates a score s(x) with each fact x s(x) = X v12V X v22V {v1} xv1 xv2 f T v1 fv2 I weighted pairwise interactions of latent factor vectors 48 of 58
  • 72. CORE - parameter estimation I parameters: ⇥ = {fv | v 2 V } I Bayesian personalized ranking, all observations are positive I goal: produce a ranked list of tuples for each relation professor at location tuple entities tuple contextfact (Fermi,Sapienza) Fermi Sapienza organization physics person x (Caesar,Rome) professor at Caesar locationRome history tuple entities tuple contextfact person x-sampled negative evidenceobserved fact I pairwise approach, x is more likely to be true than x- maximize X x f (s(x) s(x-)) I stochastic gradient ascent ⇥ ⇥ + ⌘r⇥ ⇣ ⌘ 49 of 58
  • 73. Experiments - dataset 440k facts extracted from corpus 15k facts from entity mentions linked using string matching I Contextual information article metadata bag-of-word sentences where the fact has been extracted entity type person organization location miscellaneous news desk (e.g., foreign desk) descriptors (e.g., finances) online section (e.g., sports) section (e.g., a, d) publication year m t w I letters to indicate contextual information considered 50 of 58
  • 74. Experiments - methodology I we consider (to keep experiments feasible): 10k tuples 10 surface relations19 Freebase relations I for each relation and method: B we rank the tuples subsample B we consider the top-100 predictions and label them manually I evaluation metrics: number of true facts MAP (quality of the ranking) I methods: B PITF, tensor factorization method (designed to work in-KB) B NFE, matrix completion method (best context-agnostic) B CORE, uses relations, tuples and entities as variables B CORE+m, +t, +w, +mt, +mtw 51 of 58
  • 75. Results - Freebase relations Relation # PITF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw person/company 208 70 (0.47) 92 (0.81) 91 (0.83) 90 (0.84) 91 (0.87) 92 (0.87) 95 (0.93) 96 (0.94) person/place_of_birth 117 1 (0.0) 92 (0.9) 90 (0.88) 92 (0.9) 92 (0.9) 89 (0.87) 93 (0.9) 92 (0.9) location/containedby 102 7 (0.0) 63 (0.47) 62 (0.47) 63 (0.46) 61 (0.47) 61 (0.44) 62 (0.49) 68 (0.55) parent/child 88 9 (0.01) 64 (0.6) 64 (0.56) 64 (0.59) 64 (0.62) 64 (0.57) 67 (0.67) 68 (0.63) person/place_of_death 71 1 (0.0) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.93) 67 (0.92) 69 (0.94) 67 (0.92) person/parents 67 20 (0.1) 51 (0.64) 52 (0.62) 51 (0.61) 49 (0.64) 47 (0.6) 53 (0.67) 53 (0.65) author/works_written 65 24 (0.08) 45 (0.59) 49 (0.62) 51 (0.69) 50 (0.68) 50 (0.68) 51 (0.7) 52 (0.67) person/nationality 61 21 (0.08) 25 (0.19) 27 (0.17) 28 (0.2) 26 (0.2) 29 (0.19) 27 (0.18) 27 (0.21) neighbor./neighborhood_of 39 3 (0.0) 24 (0.44) 23 (0.45) 26 (0.5) 27 (0.47) 27 (0.49) 30 (0.51) 30 (0.52) ... Average MAP100 # 0.09 0.46 0.47 0.49 0.47 0.49 0.49 0.51 Weighted Average MAP100 # 0.14 0.64 0.64 0.66 0.67 0.66 0.70 0.70 0.5 0.55 0.6 0.65 0.7 NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw Weighted Average MAP 0.64 0.64 0.66 0.67 0.66 0.70 0.70 52 of 58
  • 76. Results - surface relations Relation # PITF NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw head 162 34 (0.18) 80 (0.66) 83 (0.66) 82 (0.63) 76 (0.57) 77 (0.57) 83 (0.69) 88 (0.73) scientist 144 44 (0.17) 76 (0.6) 74 (0.55) 73 (0.56) 74 (0.6) 73 (0.59) 78 (0.66) 78 (0.69) base 133 10 (0.01) 85 (0.71) 86 (0.71) 86 (0.78) 88 (0.79) 85 (0.75) 83 (0.76) 89 (0.8) visit 118 4 (0.0) 73 (0.6) 75 (0.61) 76 (0.64) 80 (0.68) 74 (0.64) 75 (0.66) 82 (0.74) attend 92 11 (0.02) 65 (0.58) 64 (0.59) 65 (0.63) 62 (0.6) 66 (0.63) 62 (0.58) 69 (0.64) adviser 56 2 (0.0) 42 (0.56) 47 (0.58) 44 (0.58) 43 (0.59) 45 (0.63) 43 (0.53) 44 (0.63) criticize 40 5 (0.0) 31 (0.66) 33 (0.62) 33 (0.7) 33 (0.67) 33 (0.61) 35 (0.69) 37 (0.69) support 33 3 (0.0) 19 (0.27) 22 (0.28) 18 (0.21) 19 (0.28) 22 (0.27) 23 (0.27) 21 (0.27) praise 5 0 (0.0) 2 (0.0) 2 (0.01) 4 (0.03) 3 (0.01) 3 (0.02) 5 (0.03) 2 (0.01) vote 3 2 (0.01) 3 (0.63) 3 (0.63) 3 (0.32) 3 (0.49) 3 (0.51) 3 (0.59) 3 (0.64) Average MAP100 # 0.04 0.53 0.53 0.51 0.53 0.53 0.55 0.59 Weighted Average MAP100 # 0.08 0.62 0.61 0.63 0.63 0.61 0.65 0.70 0.5 0.6 0.7 NFE CORE CORE+m CORE+t CORE+w CORE+mt CORE+mtw Weighted Average MAP 0.62 0.61 0.63 0.63 0.61 0.65 0.70 53 of 58
  • 77. Anecdotal results author(x,y) ranked list of tuples 1 (Winston Groom, Forrest Gump) 2 (D. M. Thomas, White Hotel) 3 (Roger Rosenblatt, Life Itself) 4 (Edmund White, Skinned Alive) 5 (Peter Manso, Brando: The Biography) similar relations 0.98 “reviews x by y”(x,y) 0.97 “book by”(x,y) 0.95 “author of”(x,y) 0.95 ” ‘s novel”(x,y) 0.95 “ ‘s book”(x,y) similar relations 0.87 “scientist”(x,y) 0.84 “scientist with”(x,y) 0.80 “professor at”(x,y) 0.79 “scientist for”(x,y) 0.78 “neuroscientist at”(x,y) ranked list of tuples 1 (Riordan Roett, Johns Hopkins University) 2 (Dr. R. M. Roberts, University of Missouri) 3 (Linda Mayes, Yale University) 4 (Daniel T. Jones, Cardiff Business School) 5 (Russell Ross, University of Iowa) “scientist at”(x,y) I semantic similarity of relations is one aspect of our model I similar relations treated differently in different contexts 54 of 58
  • 78. Contents 1. Distributed Matrix Completion 1.1 Distributed Stochastic Gradient Descend 1.2 Input Partitioner 1.3 Evaluation 2. Context-Aware Matrix Completion 2.1 Open Relation Extraction 2.2 Context-Aware Open Relation Extraction 2.3 Evaluation 3. Conclusion and Outlook 55 of 58
  • 79. Conclusion I we tackle the scalability and the quality of MC I we investigate graph partitioning techniques for ASGD I we propose HDRF, one-pass v-cut graph partitioning alg B exploit power-law nature of real-word graphs B provides minimum replicas with close to optimal load balance B significantly reduces the time needed to perform computation I we propose CORE, a matrix completion model for open relation extraction that incorporates contextual information B based on factorization machines and BPR B extensible model, additional information can be integrated B exploiting context significantly improve prediction quality I all code released https://github.com/fabiopetroni 56 of 58
  • 80. Overview - Future workscalability quality centralizeddistributedparallel context-awarecontext-agnostic context-awarecontext-agnostic only positive evidence single-value revealed entries positive and negative evidence multi-value revealed entries Bayesian Personalized RankingRegularized Squared Loss Petroni et al., 2015a Petroni et al., 2014 Makari et al., 2014 Ahmed et al., 2013 Zhuang et al., 2013 Niu et al., 2011 Koren et al., 2009 Koren, 2008 Shi et al., 2014 Rendle et al., 2011 Riedel et al., 2013 Rendle et al., 2009 Petroni et al., 2015b Chen et al., 2012 Menon et al., 2011 Makari et al., 2014 Recht et al., 2013 Rendle, 2012 Karatzoglou et al., 2010 Takács et al., 2009 Ricci et al., 2011 56 of 58
  • 81. Future Directions I distribute training for context-aware matrix completion B asynchronous approach: local vector copies, synchronization B challenging input placement, the input describes an hypergraph I adaptation of the HDRF algorithm to hypergraphs B do high degree vertices play a crucial role? “born in”(x,y) Caesar,Rome Fermi,Rome Caesar Rome Fermi person, location physics history e1 e2 I distributed Bayesian personalized ranking B sampling of a negative counterpart for each training point B sampling from the local portion of the dataset in current node? 57 of 58
  • 82. Thank you! Questions? Fabio Petroni Sapienza University of Rome, Italy 58 of 58