The document proposes a query routing model to rank expert candidates on Twitter to answer questions. It evaluates knowledge, trust, and activity criteria to determine the best person to direct a question to. An evaluation of the model on 160 questions showed it achieved over 90% accuracy in predicting the ideal expert ranking, outperforming using individual criteria. This demonstrates the model is effective at query routing on Twitter to connect questions with suitable answers.
Scaling API-first – The story of a global engineering organization
A Query Routing Model to Rank Expertcandidates on Twitter
1. A Query Routing Model to Rank
Expert Candidates on Twitter
Cleyton Souza, Jonathas Magalhães, Evandro Costa and
Joseana Fechine
LIA - Laboratory of Artificial Intelligence
UFCG - Federal University of Campina Grande
Campina Grande - Brazil
2. Introduction
• What is Social Query?
– It is the process of asking questions trough social
media (e.g., Twitter, Facebook, etc.)! [Morris et al.]
– The common strategy is sharing the question with
everyone, but this way there is no guarantee that you
will receive a good and quick answer
• Directing questions to someone is more efficient.
• What is Query Routing?
– It is the process of directing questions to appropriate
answerers (people able to help)!
cleyton.caetano.souza@copin.ufcg.edu.br
2
3. Introduction
• What are we proposing?
– A Query Routing Model: a technique that finds
the most suitable person to help you based on
knowledge, trust and activity.
– We are focusing in the Twitter context!
A Query Routing Model to Rank Expert
Candidates on Twitter
cleyton.caetano.souza@copin.ufcg.edu.br
3
5. Related Word (1/2)
• What are the differentials of our proposal to
Previous Work?
– Context – We are focusing on a Social Network
Context;
• While previous work focused on Community Question
and Answering context…
• Why did we choose Twitter?
– It is one of the most popular Online Social Networks;
– Less than 18% percent of questions asked on Twitter are
answered [Paul et al.];
– [Nichols and Kang] confirmed that directing questions
significantly improve the response rate;
cleyton.caetano.souza@copin.ufcg.edu.br
5
6. Related Word (2/2)
• What are the differentials of our proposal to
Previous Work?
– Problem – We are leading with the Query Routing
problem as a Multi-criteria Decision Making
Problem (Weight Product Model – WPM);
• While previous work applied mainly probabilistic
models…
• Why did we choose WPM?
– [Triantaphyllou and Mann] confirmed that for problems with
dependence up to three variables, WPM achieves the best
performance
cleyton.caetano.souza@copin.ufcg.edu.br
6
7. Proposal
• Some user on Twitter has a question
• Our model analyzes the question and orders his
followers based on three criteria (further details
[Souza et al.])
– Knowledge (K) – using bag of words strategy;
– Trust (T) – a combination of similarity and
conversation rate;
– Activity (A) – mean latency time between
consecutive messages;
• What do we want?
– We want to find the best combination of K, T and A!
cleyton.caetano.souza@copin.ufcg.edu.br
7
8. Knowledge
• We want to ask someone who knows about the
topic of the question
• We used Vector Space Model
– Users and question are represented by a vector of
terms
– We match users and questions using cosine
similarity between these vectors
cleyton.caetano.souza@copin.ufcg.edu.br
8
9. Trust/Closeness
• Sometimes, we want receive answers from
people close to us
• How we automatically discover these people
– We analyze the conversation rate between the
questioner and each follower
– We analyze the followers set similarity between
the questioner and each follower
– We established that trust is the product between
conversation rate and followers set similarity
cleyton.caetano.souza@copin.ufcg.edu.br
9
10. Activity
• Sometimes, we prefer a quick answer with low
quality instead a high quality answer but slow
• Our assumption is that people who produces
a lot of content in a short time will provide
quick answers
• Activity is a mean latency time between
consecutive posts
cleyton.caetano.souza@copin.ufcg.edu.br
10
11. Proposal
• How we compare the criteria configuration of
the followers?
– We use Weight Product Model - we compare two
users using the following function:
𝑐𝑜𝑚𝑝 𝑢, 𝑣 =
𝑚𝑎𝑝 𝐾 𝑢
𝑚𝑎𝑝 𝐾 𝑣
𝑤𝑘
∗
𝑚𝑎𝑝 𝑇 𝑢
𝑚𝑎𝑝 𝑇 𝑣
𝑤𝑡
*
𝑚𝑎𝑝 𝐴 𝑢
𝑚𝑎𝑝 𝐴 𝑣
𝑤𝑎
– The result of comparison tell us who is the best
user!
– We sum the victories of each user and order them
based on their total of victories!
cleyton.caetano.souza@copin.ufcg.edu.br
11
12. Evaluation
• We used a Quantitative Approach!
• Methodology
1. We selected 160 questions and their answers
published on Twitter
2. We manually ranked the answers of each
question based on their utility
cleyton.caetano.souza@copin.ufcg.edu.br
12
13. Evaluation
• We manually ranked the answers of each
question based on their utility
Question
How Much it costs go to Disneyland?
Answer
Answer Type
Utility
I don’t know
A unhelpful answer
1
I think @someone knows
Indicating someone or some source
2
Between $1000 and $2000
A uncertainty answer
3
I was last year and I spent $700
A direct answer
4
• We used as tie-breaker the order in which the
answers were given
cleyton.caetano.souza@copin.ufcg.edu.br
13
14. Evaluation
• Methodology
4. We crawled information about their questioners and
answerers (user profile, followers set, following set, tweets);
5. We ranked the answerers using our proposal
6. We compared both ranks using nDCG
• Our aim is answer the following questions
– Does our Model perform well to predict the utility of
the answers?
– Does WPM reach better performance than the use of
criteria individually?
cleyton.caetano.souza@copin.ufcg.edu.br
14
15. Results
Question Type
[Morris et al.]
Recommendation
Amount of Questions
56
17
Opinion
Factual Knowledge
40
15
Rhetorical
3
Invitation
8
Favor
Social connection
12
9
Offer
Mean
160
cleyton.caetano.souza@copin.ufcg.edu.br
Mean of nDCG
0,92 ± 0,23
0,83 ± 0,31
0,91 ± 0,26
0,90 ± 0,25
0,99 ± 0,01
1,00 ± 0,00
0,87 ± 0,28
0,84 ± 0,31
0,90
15
16. Does our Model perform well to predict the
aptitude of the expert candidates?
• Promising results
– We reach a mean of nDCG bigger than 0.9;
– A one-tailed binomial test statically confirmed that
QR model predicted the ideal rank in more than
64% of cases (p-value= 0.03219 and α=5%);
• An improvement in comparison with [Souza et al. 2012]
cleyton.caetano.souza@copin.ufcg.edu.br
16
17. Does WPM reach better performance than
the use of individually criteria?
Figure 1: Boxplot comparing WPM with Individually Criterion
cleyton.caetano.souza@copin.ufcg.edu.br
17
18. Does WPM reach better performance than
the use of individually criteria?
• We performed a pairwise comparison using
Wilcoxon Signed Rank Test (α=5%)
Hypotheses
P-value
Conclusion
WPM has a nDCG distribution better than Knowledge
1.357e-15
True
WPM has a nDCG distribution better than Activity
6.701e-16
True
WPM has a nDCG distribution better than Trust
4.025e-16
True
cleyton.caetano.souza@copin.ufcg.edu.br
18
19. Treats to Validity
• Evaluation Methodology
• Few Questions
• Manually order answers
cleyton.caetano.souza@copin.ufcg.edu.br
19
20. Conclusion & Future Work
• We proposed a QR Model for Twitter
– We achieved promising results in a young field
– We confirmed the superiority of WPM use
– We created a public dataset for future research in the
area
• Future Work
– Is directing questions to experts more effective than
sharing questions?
– How is the relationship between the weights given to
criteria with the qualities (truth, intimacy, speed) of
the received answer?
cleyton.caetano.souza@copin.ufcg.edu.br
20
21. References
•
•
•
•
•
•
M. Morris, J. Teevan, and K. Panovich, “What do people ask their social networks, and
why?: a survey study of status message q&a behavior”, Proceedings of the 28th ACM
International Conference on Human Factors in Computing Systems, 2010, pp. 1739–
1748
J. Nichols, and J. Kang. “Asking questions of targeted strangers on social networks”.
Proceedings of the ACM Conference on Computer Supported Cooperative Work, 2012,
pp. 999–1002.
S. Paul, L. Hong, and E. Chi, “Is Twitter a good place for asking questions? a
characterization study”. Proceedings of the 5th International AAAI Conference on
Weblogs and Social Media, 2011, pp. 578–581.
C. Souza, J. Magalhães and E. Costa. “A Formal Model to the Routing Questions Problem
in the Context of Twitter”. Proceedings of the IADIS International Conference
WWW/Internet, 2011 .
C. Souza, J. Magalhães, E. Costa e J. Fechine. “Predicting Potential Responders in
Twitter : A Query Routing Algorithm”. Proceedings of the 12th International Conference
on Computational Science and Its Applications, 2012, pp. 714–729.
E. Triantaphyllou, and S. Mann, “An examination of the effectiveness of multidimensional decision-making methods: A decision-making paradox,” Decision Support
Systems, vol. 5, 1989, pp. 303–312
cleyton.caetano.souza@copin.ufcg.edu.br
21