I gave this talk at the Workshop on Recommender Enginer@TUG (http://bit.ly/yuxrAM) on 2012/12/19.
It presents a selection of algorithms and experimental data that are commonly used in recommending scientific literature. Real-world results from Mendeley's article recommendation system are also presented.
The work presented here has been partially funded by the European Commission as part of the TEAM IAPP project (grant no. 251514) within the FP7 People Programme (Marie Curie).
2. Summary
➔
2 recommendation use cases
➔
literature search with Mendeley
➔
use case 1: related research
➔
use case 2: personalised recommendations
3. Use Cases
Two types of 1) Related Research
●
given 1 research article
recommendation ●
find other related articles
use cases:
2) Personalised Recommendations
●
given a user's profile (e.g. interests)
●
find articles of interest to them
6. Use Cases
My secondment 1) Related Research
●
given 1 research article
(Dec-Feb): ●
find other related articles
2) Personalised Recommendations
●
given a user's profile (e.g. interests)
●
find articles of interest to them
7. Literature Search
Using Mendeley
Challenge!
●
Use only Mendeley to perform literature search for:
●
Related research
●
Personalised recommendations
Eating your
own dog food...
8. Found: Queries: “content similarity”, “semantic
similarity”, “semantic relatedness”, “PubMed
0 related articles”, “Google Scholar related articles”
9. Found: Queries: “content similarity”, “semantic
similarity”, “semantic relatedness”, “PubMed
1 related articles”, “Google Scholar related articles”
14. Literature Search
Using Mendeley
Summary of Results
Strategy Num Docs Comment
Found
Catalogue Search 19 9 from “Related Research”
Group Search 0 Needs work
Perso Recommendations 45 Led to a group with 37 docs!
Found:
64
15. Literature Search
Using Mendeley
Summary of Results
Strategy Num Docs Comment
Found
Catalogue Search 19 9 from “Related Research”
Group Search 0 Needs work
Perso Recommendations 45 Led to a group with 37 docs!
Eating your
Found: own dog food...
Tastes good!
64
16. 64 => 31 docs, read 14 so far,
so what do they say...?
17. Use Cases
1) Related Research
●
given 1 research article
●
find other related articles
18. Use Case 1: Related Research
7 highly relevant papers (related research for scientific articles)
Q1/4: How are the systems evaluated?
User study (e.g. Likert scale to rate relatedness
between documents). (Beel & Gipp, 2010)
TREC collections with hand classified 'related
articles' (e.g. TREC 2005 genomics track). (Lin &
Wilbur, 2007)
Try to reconstruct a document's reference list
(Pohl, Radlinski, & Joachims, 2007; Vellino, 2009)
19. Use Case 1: Related Research
7 highly relevant papers (related research for scientific articles)
Q2/4: How are the systems trained?
Paper reference lists (Pohl et al., 2007; Vellino, 2009)
Usage data (e.g. PubMed, arXiv) (Lin & Wilbur, 2007)
Document content (e.g. metadata, co-citation,
bibliographic coupling) (Gipp, Beel, & Hentschel, 2009)
Collocation in mind maps (Jöran Beel & Gipp, 2010)
20. Use Case 1: Related Research
7 highly relevant papers (related research for scientific articles)
Q3/4: Which techniques are applied?
bm25 (Lin & Wilbur, 2007)
Topic modelling (Lin & Wilbur, 2007)
Collaborative filtering (Pohl et al., 2007)
Bespoke heuristics for feature extraction (e.g. in-text
citation metrics for same sentence, paragraph). (Pohl et
al., 2007; Gipp et al., 2009)
21. Use Case 1: Related Research
7 highly relevant papers (related research for scientific articles)
Q4/4: Which techniques have most success?
Topic modelling slighty improves on BM25 (MEDLINE
abstracts) (Lin & Wilbur, 2007):
- bm25 = 0.383 precision @ 5
- PMRA = 0.399 precision @ 5
Seeding CF with usage data from arXiv won out over
using citation lists (Pohl et al., 2007)
Not yet found significant results that show content-
based or CF methods are better for this task
22. Use Case 1: Related Research
Progress so far...
Q1/2 How do we evaluate our system?
Construct a non-complex data set of related research:
●
include groups with 10-20 documents (i.e. topics)
●
no overlaps between groups (i.e. documents in common)
●
only take documents that are recognised as being in English
●
document metadata must be 'complete' (i.e. has title, year, author,
published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
→ 4,382 groups
→ mean size = 14
→ 60,715 individual documents
Given a doc, aim to retrieve the other docs from its group
●
tf-idf with lucene implementation
23. Use Case 1: Related Research
Progress so far...
Q1/2 How do we evaluate our system?
Construct a non-complex data set of related research:
●
include groups with 10-20 documents (i.e. topics)
●
no overlaps between groups (i.e. documents in common)
●
only take documents that are recognised as being in English
●
document metadata must be 'complete' (i.e. has title, year, author,
published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
→ 4,382 groups
→ mean size = 14
→ 60,715 individual documents
Given a doc, aim to retrieve the other docs from its group
●
tf-idf with lucene implementation
24. Use Case 1: Related Research
Progress so far... Metadata Presence in Documents
100.00%
Q1/2 How do we evaluate our system?
90.00%
80.00%
70.00%
Construct a non-complex data set of related research:
% of documents that field appears in
60.00%
●
include groups with 10-20 documents (i.e. topics)
50.00%
Evaluation Data Det
●
no overlaps between groups (i.e. documents in common) Group
40.00%
●
only take documents that are recognised as being in English Catalogue
30.00%
●
document metadata must be 'complete' (i.e. has title, year, author,
published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
20.00%
10.00%
→ 4,382 groups
0.00%
title
year
author
publishedIn
fileHash
abstract
generalKeyword
meshTerms
keywords
tags
→ mean size = 14
→ 60,715 individual documents
Given a doc, aim to retrieve thefield
metadata other docs from its group
25. Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
tf-idf Precision per Field for Complete Data Set
0.3
0.25
0.2
Precision @ 5
0.15
0.1
0.05
0
title mesh-term keyword
abstract generalKeyword author tag
metadata field
26. Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
tf-idf Precision per Field when Field is Available
0.5
0.45
0.4
0.35
Precision @ 5
0.3
0.25
0.2
0.15
0.1
0.05
0
tag abstract mesh-term title general-keyword author keyword
metadata field
27. Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
tf-idf Precision for Field Combos for Complete Data Set
0.4
0.35
0.3
0.25
precision @ 5
0.2
0.15
0.1
0.05
0
abstract generalKeyword author tag
bestCombo title mesh-term keyword
metadata field(s)
BestCombo = abstract+author+general-keyword+tag+title
28. Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
tf-idf Precision for Field Combos when Field is Available
0.5
0.45
0.4
0.35
0.3
precision @ 5
0.25
0.2
0.15
0.1
0.05
0
bestCombo mesh-term general-keyword keyword
tag abstract title author
metadata field(s)
BestCombo = abstract+author+general-keyword+tag+title
29. Use Case 1: Related Research
Future directions...?
Evaluate multiple techniques on same data set
Construct public data set
●
similar to current one but with data from only public groups
●
analyse composition of data set in detail
Train:
●
content-based filtering
●
collaborative filtering
●
hybrid
Evaluate the different systems on same data set
...and let's brainstorm!
30. Use Cases
2) Personalised Recommendations
●
given a user's profile (e.g. interests)
●
find articles of interest to them
31. Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q1/4: How are the systems evaluated?
Cross validation on user libraries (Bogers & van
Den Bosch, 2009; Wang & Blei, 2011)
User studies (McNee, Kapoor, & Konstan, 2006;
Parra-Santander & Brusilovsky, 2009)
32. Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q2/4: How are the systems trained?
CiteULike libraries (Bogers & van Den Bosch,
2009; Parra-Santander & Brusilovsky, 2009;
Wang & Blei, 2011)
Documents represent users and their citations
documents of interest (McNee et al., 2006)
User search history (N Kapoor et al., 2007)
33. Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q3/4: Which techniques are applied?
CF (Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)
LDA (Wang & Blei, 2011)
Hybrid of CF + LDA (Wang & Blei, 2011)
BM25 over tags to form user neighbourhood (Parra-Santander &
Brusilovsky, 2009)
Item-based and content-based CF (Bogers & van Den Bosch, 2009)
User-based CF, Naïve Bayes classifier, Probabilistic Latent Semantic
Indexing, textual TF-IDF-based algorithm (uses document abstracts)
(McNee et al., 2006)
34. Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q4/4: Which techniques have most success?
CF is much better than topic modelling (Wang & Blei, 2011)
CF-topic modelling hybrid, slightly outperforms CF alone (Wang &
Blei, 2011)
Content-based filtering performed slightly better than item-based
filtering on a test set with 1,322 CiteULike users (Bogers & van Den
Bosch, 2009)
User-based CF and tf-idf outperformed Naïve Bayes and Probabilistic
Latent Semantic Indexing significantly (McNee et al., 2006)
BM25 gave better results than CF but the study was with just 7
CiteULike users so small scale (Parra-Santander & Brusilovsky, 2009)
35. Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q4/4: Which techniques have most success?
Advantage Disadvantage
Content- Human readable form of their profile Tends to over-
based specialise
Quickly absorb new content without need for ratings
CF Works on an abstract item-user level so you don't Requires a lot of
need to 'understand' the content data
Tends to give more novel and creative
recommendations
36. Use Case 2: Perso Recommendations
Our progress so far...
Q1/2 How do we evaluate our system?
Construct an evaluation data set from user libraries
●
50,000 user libraries
●
10-fold cross validation
●
libraries vary from 20-500 documents
●
preference values are binary (in library = 1; 0 otherwise)
Train:
●
item-based collaborative filtering recommender
Evaluate:
●
train recommender and test how well it can reconstruct the users'
hidden testing libraries
●
mulitple similarity metrics (e.g. cooccurrence, loglikelihood)
37. Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
Cross validation:
●
0.1 precision @ 10 articles
Usage logs:
●
0.4 precision @ 10 articles
38. Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
39. Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
Precision at 10 articles
Number of articles in user library
40. Use Case 2: Perso Recommendations
Future directions...?
Evaluate multiple techniques
Q2/2 What are our results? on same data set
Construct data set
●
similar to current one but with more up-to-date data
●
analyse composition of data set in detail
Train:
●
content-based filtering
●
collaborative filtering (user-based and item-based)
●
hybrid
Evaluate the different systems on same data set
...and let's brainstorm!
41. Conclusion
➔
2 recommendation use cases
➔
similar problems and techniques
➔
good results so far
➔
combining CF with content would likely
improve both
43. References
Beel, Jöran, & Gipp, B. (2010). Link Analysis in Mind Maps : A New Approach to Determining Document Relatedness.
Mind, (January). Citeseer. Retrieved from http://scholar.google.com/scholar?
hl=en&btnG=Search&q=intitle:Link+Analysis+in+Mind+Maps+:
+A+New+Approach+to+Determining+Document+Relatedness#0
Bogers, T., & van Den Bosch, A. (2009). Collaborative and Content-based Filtering for Item Recommendation on Social
Bookmarking Websites. ACM RecSys ’09 Workshop on Recommender Systems and the Social Web. New York, USA.
Retrieved from http://ceur-ws.org/Vol-532/paper2.pdf
Gipp, B., Beel, J., & Hentschel, C. (2009). Scienstein: A research paper recommender system. Proceedings of the
International Conference on Emerging Trends in Computing (ICETiC’09) (pp. 309–315). Retrieved from
http://www.sciplore.org/publications/2009-Scienstein_-_A_Research_Paper_Recommender_System.pdf
Kapoor, N, Chen, J., Butler, J. T., Fouty, G. C., Stemper, J. A., Riedl, J., & Konstan, J. A. (2007). Techlens: a researcher’s
desktop. Proceedings of the 2007 ACM conference on Recommender systems (pp. 183-184). ACM.
doi:10.1145/1297231.1297268
Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC
Bioinformatics, 8(1), 423. BioMed Central. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17971238
McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: avoiding pitfalls when recommending research
papers. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (p. 180). ACM.
Retrieved from http://portal.acm.org/citation.cfm?id=1180875.1180903
Parra-Santander, D., & Brusilovsky, P. (2009). Evaluation of Collaborative Filtering Algorithms for Recommending Articles.
Web 3.0: Merging Semantic Web and Social Web at HyperText ’09 (pp. 3-6). Torino, Italy. Retrieved from http://ceur-
ws.org/Vol-467/paper5.pdf
Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records.
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 418-419). ACM. Retrieved from
http://portal.acm.org/citation.cfm?id=1255175.1255260
Vellino, A. (2009). The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles. Retrieved
from http://cuvier.cisti.nrc.ca/~vellino/documents/PageRankRecommender-Vellino2008.pdf
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th
ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448–456). ACM. Retrieved from
http://dl.acm.org/citation.cfm?id=2020480