Expert Finding in Social Networks

Introduction Deﬁnition of the problem Techniques for Expertise Retrieval Tests Conclusions
Expert Finding in Social Networks
Matteo Silvestri Giuliano Vesci
Politecnico di Milano
25-07-2012
Silvestri, Vesci: Expert Finding in Social Networks 1 / 25

Outline
1 Introduction
2 Deﬁnition of the problem
3 Techniques for Expertise Retrieval
4 Tests
5 Conclusions

Expert Finding in Social Networks
There are several problems that require looking for expert users
inside online social networks. For example:
friends experts in cinema
friends that know about a particular disease
friends able to use a particular technology
The task of finding experts able to answer specific informative
needs is called expert finding.
In particular, we studied this problem in the human computation
field of CrowdSearcher, an approach that bridges conventional
search experiences to crowdsourcing.
Problem: assign CrowdSearcher tasks to expert users

Research Questions
Can the analysis of social actions (e.g. posts, tweets,
interaction with social groups, etc.) help in providing a better
characterization of users for search tasks?
Is the combined use of social network information useful to
better characterize a user?
Among the available approaches to expert ﬁnding, which one
is better suited in the context of social networks?
Are social networks oriented toward speciﬁc domains of
expertise?
Goal: methodologies and tools for the selection of best experts in a
set of trusted users in (multiple) social networks.

Outline
1 Introduction
4 Tests
5 Conclusions

Deﬁnition of the problem
Automatically reach experts for crowdsourced queries:
Given a query q and a set CE = (ce1, ce2, ..., cem) of social
users that are candidate experts, ﬁnd a ordered subset
S(CE) ⊂ CE of n users with the highest scores score(q, cei).
Score(q, cei) S(CE)CE
q
Estimating the scoring function score(q, ei) is the main task of
this work

Social Network Characterization
Two types of social information characterize users:
explicit information: static proﬁles
implicit information: social dynamic activities
Social network users can perform several activities and publish
informative materials, that we call resources.
The idea is to collect evidence of expertise from multiple resources
associated to a candidate.

Resources Levels
Resources are related to the user through a path in the graph.
We consider resources connected to a user through a path of
length <= 2.
Post@09.10
ALICE
Post@09.00
Post@09.05
owns
owns / creates
annotates
(likes)
Facebook
Group
relatesTo
(belongs)Post@08.00
Post@08.05
contains
contains
creates
Level 0
Level 1
Level 2

Outline
1 Introduction
4 Tests
5 Conclusions

Analysis
Resources have to be analyzed to infer expertise information
Crawling (API) Url Extraction
Text
Preprocessing
Language
Detection
Named Entity
Extraction
Crawling: extraction of resources’ textual content exploiting Social
Networks API
Url Extraction: extract the content of eventual external websites and
append it to the resource’s text
Language Detection: it is not recommended having diﬀerent languages
in the same index in information retrieval systems, so we detect the
language of resources.
Named Entity Extraction: extraction of entities like people, cities and
movies
Textual Preprocessing: we remove stop-words (common words), ﬁlter
out html tags, perform stemming.

Model 1: Resource Based
Query Resource Candidate
weight(r,c)score(q,r)
It is based on resources, considered as documents in a classic Vector Space
Model. Resources are represented both as term vectors and entity id vectors
1 First, the similarity between the query and resources is computed:
score(q, r) = α·
t∈q
tf (t, r) · idf (t)2
+ β ·
e∈q
tf (e, r) · idf (t)2
· eConf (e, r)
2 Then, users related to best resources are extracted as possible experts:
score(q, ce) =
ri ∈S(R)
score(q, ri )
max
rj ∈S(R)
score(q, rj )
· weight(ri , ce)
Varying on α and β, we obtain three matching methods:
Mixed: α > 0, β > 0
TextOnly: α = 1, β = 0
EntityOnly: α = 0, β = 1

Model 2: User Based
Query Domain
Candidate
Expertise
score(q,ce)
EntityResource
s(d,e)s(d,r)s(d,ce)
We refer to about 70 Freebase domains such as sports, location,
education, book, comics, videogames, tv.
For each entity e in a resource, a score s(d, e) is computed, denoting how
much the entity is related to a domain of expertise d:
s(d, e) =
j∈I(d)
1
log2(1+j)
v
i=1
1
log2(1+i)
,
Then a similar score is computed for each resource s(d, r), given all the
entities in the resource related to the domain d:
s(d, r) =
e∈E(r)
s(d, e) · rel(e, r),
where rel(e, r) is a measure of relevance of the entity in the resource

Model 2: User Based - User/Domain Matrix
Finally, the score s(d, ce) is computed for each candidate
expert-domain couple, to build a model of the users as a
matrix CE, D:
s(d, ce) =
r∈S(R,ce)
weight(r, ce) · s(d, r)
r∈S(R,ce)
weight(r, ce)
Sport Music TV Education Movies ...
Candidate Expert 1 .033 .012 .068 .037 .034 ...
... ... ... ... ... ... ...
For each query is computed s(d, q), similarly to resources
Looking at the matrix of expertise, the score for a user is
computed as:
score(q, ce) = expertise(q) • expertise(ce) =
d∈D(q)
s(d, q) · s(d, ce)

Outline
1 Introduction
4 Tests
5 Conclusions

Experimental Setup
Dataset built through a recruitment campaign:
Facebook Twitter LinkedIn
#Users 39 23 28
#English Resources 107,956 33,022 11,486
#Italian Resources 124,537 14,038 4,133
#Total Resources 232,493 47,060 15,619
Test suite of 30 information needs, or queries, involving
various domains:
Which php function can I use to obtain the length of a string?
Can you list some restaurant in Milan?
Ground truth: graded relevance judgments of users’ expertise
are obtained from the users themselves trough an online
questionnaire

Tests - Resource based conﬁgurations comparison
Model Metrics
type level entity MAP MRR NDCG NDCG@10
Resource Based
0
text only .2034 .6264 .2963 .3183
entity only .0454 .2500 .0731 .0821
mixed .2026 .6014 .2832 .3020
1
text only .3330 .8048 .4348 .4542
entity only .2767 .8050 .3807 .4059
mixed .3150 .8000 .4272 .4335
2
text only .2932 .8111 .4338 .4448
entity only .3363 .8122 .4485 .4292
mixed .3245 .8444 .4454 .4581
Data showed in the table were obtained considering:
english resources
as relevants users, the ones above the average,
for each query
entityConf (e, r) = 1 + tagMeScore(e, r)
top 50 resources
for the mixed matching method: α = 1, β = 2
weight(e, r) = 1∀r ∈ Lv0, Lv1, weight(e, r) =
0.2∀r ∈ Lv2

Tests - Resources window
Another experiment was made by varying the number of resources
considered in the score. We call that size window
For simplicity, we only considered Lv2-Mixed and Lv1-TextOnly
conﬁgurations
Considering more resources increases system quality till the 3-4%. Then,
the curves stabilize: increasing the window size does not lead to
signiﬁcantly better results

Tests - User based
Model Metrics
type level MAP MRR NDCG NDCG@10
User Based
0 .3685 .7603 .4907 .4332
1 .3546 .7306 .4990 .4526
2 .3424 .8178 .4770 .4288
Table: Overall-comparison-User-Based
Data showed in the table were obtained considering:
english resources
as relevants users, the ones above
the average, for each query
top 20 users
weight(e, r) = 1∀r ∈
Lv0, Lv1, weight(e, r) = 0.2∀r ∈
Lv2

Tests - User/Resource based models comparison
The two models presented are evaluated in terms of results
quality and performances.
We considered the best conﬁguration for both: Lv2-Mixed for
the resource based and Lv1 for the user based

Tests - User/Resource based models comparison
The index size is showed in logarithmic scale: index expertise
as a pre-built user-domain matrix provides evident advantages
For the resource based model, the query time is linear on the
window size, while it is constant for the user based one.

Tests - Verticalization
An additional and interesting experiment is given by
considering only resources of a single domain and channel
For semplicity, we only considered Lv2-Mixed conﬁguration,
with the window size ﬁxed to 50.
Domain
Channel
FB TW Lin
computer eng. .2112 .5858 .4472
location .1852 .3549 .2033
movies & tv .2794 .4296 .1578
music .2868 .4229 .2672
science .1827 .4260 .3827
sport .2856 .4225 .1933
tech. & games .2297 .4186 .2052
All domains .2526 .4296 .2670
Table: MAP
Domain
Channel
FB TW Lin
computer eng. .5038 .7014 .4904
location .4423 .4172 .3517
movies & tv .4460 .4960 .2028
music .3957 .4631 .4226
science .3004 .4366 .4977
sport .5497 .4092 .3298
tech. & games .3641 .4545 .2352
All domains .4415 .4791 .3473
Table: NDCG@10

Outline
1 Introduction
4 Tests
5 Conclusions

Conclusions
We classified resources in two main classes: static resources
and dynamic resources
We adopted and extended two models of experts finding
The analysis of social activities can help to better characterize
the expertise of users
The adoption of multiple social networks can greatly improve
the representation of a user for expert finding purposes, but,
for specific domains, it is better to stress single platforms.

Open questions
Exploiting social graph to
improve experts retrieval
Domain speciﬁc queries
require a less general
approach
Example: Geolocalized
queries!

Questions & Answers

Expert Finding in Social Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Expert Finding in Social Networks

Similar to Expert Finding in Social Networks (20)

Recently uploaded

Recently uploaded (20)

Expert Finding in Social Networks