2. 2
Outline
Recommender Systems and Information Extraction for researchers
• What is Data Science
• What is Mendeley
• Recommender Systems at Mendeley
• Information Extraction at Mendeley
06/11/2015
3. 3
What is Data Science?
Recommender Systems and Information Extraction for researchers06/11/2015
4. 4
Why a data scientist?
Recommender Systems and Information Extraction for researchers06/11/2015
5. 5
Who want a data scientist?
Recommender Systems and Information Extraction for researchers06/11/2015
6. 6
Who want a data scientist? [2]
Recommender Systems and Information Extraction for researchers06/11/2015
7. 7
Two main types
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
8. 8
Two main types [2]
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
9. 9
Skills for Data Science
Recommender Systems and Information Extraction for researchers
http://businessoverbroadway.com/investigating-data-scientists-their-skills-and-team-makeup
06/11/2015
11. 11
Mendeley builds tools to
help researchers … [2]
Recommender Systems and Information Extraction for researchers
Read
&
Organize
Search
&
Discover
Collaborate
&
Network
Experiment
&
Synthesize
06/11/2015
12. 12
Read & Organize
Recommender Systems and Information Extraction for researchers
Reference
management
Cite-as-you-
write
Full-text article
search
Digitalised
annotations
06/11/2015
13. 13
Search & Discover
Recommender Systems and Information Extraction for researchers
Mendeley
Suggest
Literature
Search
Related
Documents
06/11/2015
14. 14
Collaborate & Network
Recommender Systems and Information Extraction for researchers
Research
network
Professional
research groups
06/11/2015
18. 18
What is a
Recommender System?
Recommender Systems and Information Extraction for researchers
Recommender systems are a subclass of information filtering system that seek to
predict the 'rating' or 'preference' that a user would give to an item. [Wikipedia]
06/11/2015
19. 19
Why Recommender Systems
at Mendeley?
Recommender Systems and Information Extraction for researchers
Vision:
“To build a personalised research advisor that helps
you to organise your work, contextualise it within the
global body of research, and connect you with
relevant researchers and artifacts.”
06/11/2015
21. 21
Recommender Systems
at Mendeley – Mendeley Suggest
Recommender Systems and Information Extraction for researchers
https://www.mendeley.com/suggest/
06/11/2015
22. 22
Recommender System
Components
Recommender Systems and Information Extraction for researchers
Algorithms
Business Logic
and Analytics
User Experience
Data Sources Algorithms
Business
Logic
&
Analytics
User
Interface
06/11/2015
23. 23
Data Sources
Recommender Systems and Information Extraction for researchers
• Mendeley
– User Libraries
• What the users have in their libraries (what they read, what they
annotate, what they highlight, what folders they have, etc. etc.)
– Articles metadata (title, authors, abstract, keywords, tags, etc. etc.)
– Groups
• Scopus
– Citation network
• Science Direct
– Logs
• …
06/11/2015
24. 24
Algorithms
Recommender Systems and Information Extraction for researchers
1. Collaborative filtering
User-based
If Alice read X, Y, Z and Bob read X, Y, Z and W, we recommend W to
Alice
+ Work well for us because users << items
- Only for users with enough articles in the library
Item-based
Users who read X also read Y
+ Item-item similarity matrix is useful to model last n articles read
- Expensive in our setting (millions of items)
06/11/2015
25. 25
Algorithms [2]
Recommender Systems and Information Extraction for researchers
1. Collaborative filtering (still)
Matrix factorization
+ Best CF model in literature
- Generate recommendations on a catalog of million of items is too
slow
1 1 1
1 1 1
? ? 1 ? 1 ?
1 1 1
1 1
1 1 1
U
n x k
V
k x m
X
n x m
X
≈
06/11/2015
26. 26
Algorithms [3]
Recommender Systems and Information Extraction for researchers
2. Content-based
I read articles about text mining, show me other stuff about text mining
+ Good for cold users (users without data)
- Overspecialisation: items recommended are too similar
3. Popularity/Trending
I work in Computer Science, show me popular/trending articles in
Computer Science
+ Perfect for cold users
- Non personalised, discipline too broad
06/11/2015
27. 27
Algorithms [4]
Recommender Systems and Information Extraction for researchers
4. Citation Network
§ Articles similar to articles I cited
§ Articles that cite me
§ Articles from my co-author
+ Good for some kind of users
- Young researchers do not have (enough) publications
06/11/2015
28. 28
Evaluation
Recommender Systems and Information Extraction for researchers
• Offline Evaluation of 100+ algorithms variations on an
historical dataset
• Split data into training and testing based on timestamps: train until day
X, try to predict what users will add in the next day/week/month
• Computed different metrics to measure different dimensions:
• Accuracy (precision, recall, f-score, nDCG, MAP)
• Diversity
• Recency
• Popularity
• Consistency
• Coverage
• Online Evaluation computing CTR on logs data
• Do offline and online correlate?
06/11/2015
29. 29
Business Logic / Analytics
Recommender Systems and Information Extraction for researchers
• Business put some constraints that could have an
impact on the recommendation experience
– Don’t show articles outside the user discipline
– Show articles only with a minimum readership
– Show only recommendations that you can explain (especially for people
recommendations, a different matter)
• Analytics
– Dashboard on the recommender statistics:
• Number of recommendations served
• Number of users with recommendations
• …
06/11/2015
30. 30
User Interface
Recommender Systems and Information Extraction for researchers
• Original idea: One list fits
all
Create a single list with the
best recommendations for
the user: use advanced
methods to take into
account every signal and
provide what is best for you!
06/11/2015
31. 31
User Interface [2]
Recommender Systems and Information Extraction for researchers
• However…
– Different kinds of users can have different information
needs!
– The same user in different contexts can have different
information needs!
VS
06/11/2015
32. 32
User Interface [3]
Recommender Systems and Information Extraction for researchers
• Solution: different lists!
• Provide multiple lists that satisfy different information needs
• More likely for a user to find something he is interested in
06/11/2015
33. 33
Lesson learned
Recommender Systems and Information Extraction for researchers
• It’s not about the best algorithm, it’s about the entire
user experience!
• Easier (if you can) to put together different lists that
serve different information needs than to try to satisfy
every user with a single list
06/11/2015
35. 35
Lots of content in an article
Recommender Systems and Information Extraction for researchers06/11/2015
36. 36
Metadata Extraction
Recommender Systems and Information Extraction for researchers
• Metadata extraction from PDFs was one of the first features
of Mendeley
• It makes easy to organize your articles
• It powers
Mendeley catalog
06/11/2015
37. 37
Citation Extraction
Recommender Systems and Information Extraction for researchers
• Citation extraction from any source and link to the
Mendeley catalog
• It extracts citable references and a narrative path in
the Mendeley environment
06/11/2015
38. 38
Machine learning for
extraction
Recommender Systems and Information Extraction for researchers
• Conditional Random Fields (CRF) [1]
• We label sequences of tokens yt given feature functions fk(yt, xt)
• E.g. ‘yt is AUTHOR and xt-1 is bold’ and ‘yt is AUTHOR and yt-1 is TITLE’
[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: probabilistic models for segmenting and
labeling sequence data. In ICML, 2001
Fig. 2.4 in Sutton & McCallum 2011 observations states
06/11/2015
39. 39
What cites this work
Recommender Systems and Information Extraction for researchers06/11/2015
40. 40
What cites this work [2]
Recommender Systems and Information Extraction for researchers06/11/2015
41. 41
Mendeley Research Maps
Recommender Systems and Information Extraction for researchers
https://marcorossettiblog.wordpress.com/2015/07/05/mendeley-research-maps/
06/11/2015