ECIR 2019 - Information Retrieval Models for Contact Recommendation in Social Networks
25 de Apr de 2019•0 gostou
0 gostaram
Seja o primeiro a gostar disto
mostrar mais
•522 visualizações
visualizações
Vistos totais
0
No Slideshare
0
De incorporações
0
Número de incorporações
0
Baixar para ler offline
Denunciar
Tecnologia
Slides for our paper "Information Retrieval Models for Contact Recommendation in Social Networks" at the 41st European Conference on Information Retrieval (ECIR 2019, Cologne, Germany, 13-18 April 2019)
ECIR 2019 - Information Retrieval Models for Contact Recommendation in Social Networks
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Information Retrieval Models for Contact
Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Javier Sanz-Cruzado and Pablo Castells
@JavierSanzCruza, @pcastells
Universidad Autónoma de Madrid
http://ir.ii.uam.es
Cologne, Germany, April 15th 2019
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Motivation
Online Social networks:
Appeared at early 2000s
Most used web applications
New challenges for IR
Social
Recommendation
Search
Social IR
Content
Recommendation
Contact
Recommendation
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Item recomendation
4 4 2 2 4
1 4 4 3
4 3 ? 2 ? 1 4 ?
4 3 3 1
1 1 5 2
Users
Items
Rating matrix
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Contact recomendation
?
- 1 1
- 2
1 ? - ? 1
3 -
1 4 -
Users
Users
Rating matrix = adjacency matrix
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Contact recommendation approaches
Machine
Learning
Specifical
models
Recommender
systems
Contact
recommendation
Information
Retrieval¿ ?
Adaptations
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Relation between tasks
Contact
recommendation
Target
user
Candidate
user
Relevant
link
?
Relevant
result
Term
DocumentQuery Target
user
Candidate
item
Neighbor
user
Relevant
item
?
Collaborative
filteringIR task
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Relation between tasks
IR task
Collaborative
filtering
Contact
recommendation
Target
user
Candidate
user
Neighbor
user
Relevant
link
?
Relevant
result
Term
DocumentQuery Target
user
Candidate
item
Neighbor
user
Relevant
item
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Relation between tasks
Contact
recommendation
Target
user
Candidate
user
Neighbor
user
Relevant
link
Und
In
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
An example: BM25
Original formulation:
𝑓𝑞 𝑑 =
𝑡∈𝑑∩𝑞
𝑘 + 1 freq 𝑡, 𝑑
𝑘 1 − 𝑏 +
𝑏 𝑑
avg 𝑑′ 𝑑′ + freq 𝑡, 𝑑
RSJ 𝑡
𝑅𝑆𝐽 𝑤 = log
𝐷 − 𝐷𝑡 − 0.5
𝐷𝑡 − 0.5
Where
𝑑: document
𝑞: query
𝑡: term
𝐷: set of all documents
𝐷𝑡: documents containing 𝑡
freq 𝑡, 𝑑 : frequency of 𝑡 𝑖𝑛 𝑑
𝑑 : document 𝑑 length
Γ 𝑑 𝑣 : candidate user
Γ 𝑞
𝑢 : target user
𝑡: neighbor user
𝒰: all users
Γinv
𝑑
𝑡 : 𝑣 containing 𝑡 in Γ 𝑑 𝑣
𝑤 𝑑 𝑡, 𝑣 : edge weight
len𝑙 𝑣 = 𝑥∈Γ 𝑙 𝑣 𝑤 𝑙 𝑥, 𝑣
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
An example: BM25
Original formulation:
𝑓𝑢 𝑣 =
𝑡∈Γ 𝑞 𝑢 ∩Γ 𝑑 𝑣
𝑘 + 1 w 𝑑 𝑡, 𝑣 RSJ 𝑡
𝑘 1 − 𝑏 +
𝑏 ⋅ len𝑙 𝑣
avg 𝑣′ Γ𝑙 𝑣′ + w 𝑑 𝑡, 𝑣
𝑅𝑆𝐽 𝑡 = log
𝒰 − Γinv
𝑑
𝑡 + 0.5
Γinv
𝑑
𝑡 + 0.5
Where
𝑑: document
𝑞: query
𝑡: term
𝐷: set of all documents
𝐷𝑡: documents containing 𝑡
freq 𝑡, 𝑑 : frequency of 𝑡 𝑖𝑛 𝑑
𝑑 : document 𝑑 length
Γ 𝑑 𝑣 : candidate user
Γ 𝑞
𝑢 : target user
𝑡: neighbor user
𝒰: all users
Γinv
𝑑
𝑡 : 𝑣 containing 𝑡 in Γ 𝑑 𝑣
w 𝑑 𝑡, 𝑣 : edge weight
len𝑙 𝑣 = 𝑥∈Γ 𝑙 𝑣 w 𝑙 𝑥, 𝑣
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Experiments
Offline evaluation
Data extracted from Twitter (REST API)
Interaction graphs: 𝑢, 𝑣 ∈ 𝐸 ⟺ 𝑢 retweets, mentions 𝑣
Snowball sampling
Two samples:
1. 1 month: All tweets between 19th June and 19th July 2015
2. 200 tweets: 200 last tweets by each user before 2nd August 2015
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Evaluation methodology
Networks are split in training/test (temporal split)
– 1 month: Interactions before July 12th / after July 12th
– 200 tweets: Interactions in first 80% of tweets / remaining 20%
Recommendations applied over train data
– Reciprocal links are not recommended
Evaluated using IR metrics on test: P@10, R@10, nDCG@10
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Algorithms
IR models
– Vector space model (VSM)
– BIR
– BM25
– Query Likelihood (QL) - Jelinek-Mercer, Dirichlet and Laplace smoothing
General collaborative filtering
– User-based / Item-based kNN
– Implicit matrix factorization (iMF)
Specific approaches
– Adamic-Adar
– Most common neighbors (MCN)
– Personalized PageRank
– Jaccard similarity
– Money
Sanity-check: Random and most popular
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Results (P@10)
1 month 200 tweets
BM25 0.0691 0.0572
BIR 0.0675 0.0534
QLL 0.0609 0.0490
QLJM 0.0580 0.0492
QLD 0.0441 0.0482
VSM 0.0191 0.0268
Money 0.0772 0.0476
Adamic-Adar 0.0676 0.0532
MCN 0.0631 0.0501
PageRank Pers. 0.0598 0.0336
Jaccard 0.0226 0.0304
iMF 0.0834 0.0541
User-based kNN 0.0805 0.0479
Item-based kNN 0.0739 0.0360
Popularity 0.0255 0.0225
Random 0.0009 0.0003
IR models are effective
– Probabilistic models among top 5
– BM25 best in “200 tweets”
– VSM lowest performing IR model
Rest of algorithms:
– Implicit MF is best.
– Adamic-Adar and MCN are
competitive.
– Jaccard is not very competitive.
– Rest seem very graph-
dependent.
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Efficiency
BM25 is much faster than MF
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Conclusions
IR models can be applied to contact recommendation
They provide an effective and efficient solution
Future work:
– More IR models
– Effects of sampling methods on accuracy
– Beyond accuracy: Novelty, diversity and effects on the network
IRGIRGroup @UAM
Information Retrieval Models for Contact Recommendation in Social Networks
41st European Conference on Information Retrieval (ECIR 2019)
Cologne, Germany, 15 April 2019
Thank you for your attention!
Questions?
Notas do Editor
Presentation time: 15 minutes + 5 minutes (questions)
Scheme:
Presentation
Motivation
Unified formulation for IR models for CRec.
An example: BM25
Experiments: Accuracy
Experiments: Efficiency
Conclusions
Slide text:
Hello, my name is Javier Sanz-Cruzado, and I am going to show you the work I have developed along Pablo Castells, titled Information Retrieval Models for Contact Recommendation in Social Networks. In this work, we explore the adaptation of several Information Retrieval models, originally oriented to textual search, as algorithms for recommending people-to-people in social network environments.
Since their appearance at the beginning of the century, online social networks have become some of the most used and important applications on the Web. The availability of new kinds of information such as relations between users or the different contents they generate have given place to new challenges for information retrieval, in both search and recommendation tasks. In our work, we focus on recommender systems. In that field, we could differentiate two different tasks, according to what we want to recommend. First, content recommendation, for finding relevant posts on Facebook, tweets on Twitter, etc. Second, recommending other people in the network who you might be interested in connect with: the so-called contact recommendation. We focus on this second perspective, but, why?
The reason is the following. Although the contact recommendation task might seem similar to other recommendation scenarios (and, indeed, could be treated as any other one), it has many particularities that make this case quite special. First, in classical recommendation scenarios, users and ítems have been, traditionally, separate sets. However, in contact recommendation, that affirmation is not true anymore: the set of ítems is directly extracted from the set of users of the system. The second is that, in addition to the “rating matrix”, we have additional information about how the different users are related to each other: the structure of the network.
The reason is the following. Although the contact recommendation task might seem similar to other recommendation scenarios (and, indeed, could be treated as any other one), it has many particularities that make this case quite special. First, in classical recommendation scenarios, users and ítems have been, traditionally, separate sets. However, in contact recommendation, that affirmation is not true anymore: the set of ítems is directly extracted from the set of users of the system. The second is that, in addition to the “rating matrix”, we have additional information about how the different users are related to each other: the structure of the network.
Although simple, these particularities have given place to the development of multiple specifical algorithms for this task, originated in very different fields, such as link prediction, classical recommendation or machine learning. Given that variety of algorithms, we ask the following: is it possible to apply IR models to recommend people in social networks?
In order to solve that question, first, we first need to establish equivalences between the elements in the contact recommendation task (users and their interactions between them) and the spaces involved in text search (queries, documents and terms). In the slide, we can see the relationships between the different elements in the different tasks. As we can see, the scheme of both tasks is quite similar: in the IR task, given a query, we would need to obtain relevant documents using the textual representation of both elements; in the contact recommendation one, given a target user, we would need to identify suitable people, using the common connections in the graph between both users. The idea is then to fold the three IR spaces into just one: the users in the network. How?
In order to solve that question, first, we first need to establish equivalences between the elements in the contact recommendation task (users and their interactions between them) and the spaces involved in text search (queries, documents and terms). In the slide, we can see the relationships between the different elements in the different tasks. As we can see, the scheme of both tasks is quite similar: in the IR task, given a query, we would need to obtain relevant documents using the textual representation of both elements; in the contact recommendation one, given a target user, we would need to identify suitable people, using the common connections in the graph between both users. The idea is then to fold the three IR spaces into just one: the users in the network. How?
In order to illustrate all the previous talk, let’s see an example, by adapting one of the most well-known and effective state-of-the-art IR models: BM25. In the slide, we can see the original formulation. We will need to change all the elements in that formula: documents, query, terms, etc. to the space of users in order to be able to use it.
In order to illustrate all the previous talk, let’s see an example, by adapting one of the most well-known and effective state-of-the-art IR models: BM25. In the slide, we can see the original formulation. We will need to change all the elements in that formula: documents, query, terms, etc. to the space of users in order to be able to use it.
One we have seen that those algorithms could be adapted, now, we should see whether they are effective as contact recommendation algorithms or not. In order to test that, we extracted two interaction graphs from Twitter, using the snowball sampling method. For the first sample, we obtained all the interactions between 10,000 users for the duration of a month, and, for the second, we extracted the interactions in the last 200 tweets published by a set of 10,000 users.
Both networks were divided in training and test sets using a temporal split, and, recommendations were applied over the training data, and evaluated on test using IR metrics such as Precision, recall or ndcg. In order to prevent users without test links to distort the evaluation, we just applied recommendations over users with links in the test set.
We applied several IR algorithms, like the vector space model, BM25 variants and query likelihood, and also, classical recommendation approaches and specific state-of-the-art algorithms.
In the table we show in the screen, we observe the results of our experiments in terms of P@10. We highlight two algorithms: implicit matrix factorization, and BM25, since they were the best in one of the tested graphs.
Do IR models have any additional advantage over other state-of-the art algorithms, such as matrix factorization or nearest neighbors approaches?
In order to check that, we (brief description of the experiment…) and we saw that IR models.
RECOMMMENDATION TIME: how much time we spend for recommending
UPDATE TIME: Given new data, how much time (in seconds) is needed to re-train the algorithm?
This is supported by a theoretical analysis, which is included in the paper (as well as the backup slides).
Hopefully intuitive
Explains why and what kind of popularity
Advantages: algorithmic design justification, deeper understanding of behavior, enabling extensions & variations
Item based better, particularly when observation bias is removed