SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Recommender Systems and

Information Extraction for
researchers
Marco Rossetti
@ross85

6/11/2015
Mendeley, London
2
Outline
Recommender Systems and Information Extraction for researchers
• What is Data Science
• What is Mendeley
• Recommender Systems at Mendeley
• Information Extraction at Mendeley
06/11/2015
3
What is Data Science?
Recommender Systems and Information Extraction for researchers06/11/2015
4
Why a data scientist?
Recommender Systems and Information Extraction for researchers06/11/2015
5
Who want a data scientist?
Recommender Systems and Information Extraction for researchers06/11/2015
6
Who want a data scientist? [2]
Recommender Systems and Information Extraction for researchers06/11/2015
7
Two main types
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
8
Two main types [2]
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
9
Skills for Data Science
Recommender Systems and Information Extraction for researchers
http://businessoverbroadway.com/investigating-data-scientists-their-skills-and-team-makeup
06/11/2015
10
What is Mendeley
Recommender Systems and Information Extraction for researchers06/11/2015
11
Mendeley builds tools to

help researchers … [2]
Recommender Systems and Information Extraction for researchers
Read
&
Organize
Search
&
Discover
Collaborate
&
Network
Experiment
&
Synthesize
06/11/2015
12
Read & Organize
Recommender Systems and Information Extraction for researchers
Reference
management
Cite-as-you-
write
Full-text article
search
Digitalised
annotations
06/11/2015
13
Search & Discover
Recommender Systems and Information Extraction for researchers
Mendeley
Suggest
Literature
Search
Related
Documents
06/11/2015
14
Collaborate & Network
Recommender Systems and Information Extraction for researchers
Research
network
Professional
research groups
06/11/2015
15
Mendeley & Elsevier
Recommender Systems and Information Extraction for researchers06/11/2015
16
Elsevier Products
Recommender Systems and Information Extraction for researchers06/11/2015
17
Recommender Systems
Recommender Systems and Information Extraction for researchers06/11/2015
18
What is a

Recommender System?
Recommender Systems and Information Extraction for researchers

Recommender systems are a subclass of information filtering system that seek to
predict the 'rating' or 'preference' that a user would give to an item. [Wikipedia]
06/11/2015
19
Why Recommender Systems

at Mendeley?
Recommender Systems and Information Extraction for researchers
Vision:
“To build a personalised research advisor that helps
you to organise your work, contextualise it within the
global body of research, and connect you with
relevant researchers and artifacts.”
06/11/2015
20
Recommender Systems

at Mendeley – Related Documents
Recommender Systems and Information Extraction for researchers06/11/2015
21
Recommender Systems

at Mendeley – Mendeley Suggest
Recommender Systems and Information Extraction for researchers
https://www.mendeley.com/suggest/
06/11/2015
22
Recommender System

Components
Recommender Systems and Information Extraction for researchers
Algorithms
Business Logic
and Analytics
User Experience
Data Sources Algorithms
Business
Logic
&
Analytics
User
Interface
06/11/2015
23
Data Sources
Recommender Systems and Information Extraction for researchers
• Mendeley
– User Libraries
• What the users have in their libraries (what they read, what they
annotate, what they highlight, what folders they have, etc. etc.)
– Articles metadata (title, authors, abstract, keywords, tags, etc. etc.)
– Groups
• Scopus
– Citation network
• Science Direct
– Logs
• …
06/11/2015
24
Algorithms
Recommender Systems and Information Extraction for researchers
1.  Collaborative filtering
User-based
If Alice read X, Y, Z and Bob read X, Y, Z and W, we recommend W to
Alice
+ Work well for us because users << items
- Only for users with enough articles in the library
Item-based
Users who read X also read Y
+ Item-item similarity matrix is useful to model last n articles read
- Expensive in our setting (millions of items)
06/11/2015
25
Algorithms [2]
Recommender Systems and Information Extraction for researchers
1.  Collaborative filtering (still)
Matrix factorization
+ Best CF model in literature
- Generate recommendations on a catalog of million of items is too
slow
1 1 1
1 1 1
? ? 1 ? 1 ?
1 1 1
1 1
1 1 1
U
n x k
V
k x m
X
n x m
X
≈
06/11/2015
26
Algorithms [3]
Recommender Systems and Information Extraction for researchers
2.  Content-based
I read articles about text mining, show me other stuff about text mining
+ Good for cold users (users without data)
- Overspecialisation: items recommended are too similar
3.  Popularity/Trending
I work in Computer Science, show me popular/trending articles in
Computer Science
+ Perfect for cold users
- Non personalised, discipline too broad
06/11/2015
27
Algorithms [4]
Recommender Systems and Information Extraction for researchers
4.  Citation Network
§ Articles similar to articles I cited
§ Articles that cite me
§ Articles from my co-author
+ Good for some kind of users
- Young researchers do not have (enough) publications
06/11/2015
28
Evaluation
Recommender Systems and Information Extraction for researchers
• Offline Evaluation of 100+ algorithms variations on an
historical dataset
• Split data into training and testing based on timestamps: train until day
X, try to predict what users will add in the next day/week/month
• Computed different metrics to measure different dimensions:
• Accuracy (precision, recall, f-score, nDCG, MAP)
• Diversity
• Recency
• Popularity
• Consistency
• Coverage
• Online Evaluation computing CTR on logs data
• Do offline and online correlate?
06/11/2015
29
Business Logic / Analytics
Recommender Systems and Information Extraction for researchers
• Business put some constraints that could have an
impact on the recommendation experience
– Don’t show articles outside the user discipline
– Show articles only with a minimum readership
– Show only recommendations that you can explain (especially for people
recommendations, a different matter)
• Analytics
– Dashboard on the recommender statistics:
• Number of recommendations served
• Number of users with recommendations
• …
06/11/2015
30
User Interface
Recommender Systems and Information Extraction for researchers
• Original idea: One list fits
all
Create a single list with the
best recommendations for
the user: use advanced
methods to take into
account every signal and
provide what is best for you!
06/11/2015
31
User Interface [2]
Recommender Systems and Information Extraction for researchers
• However…
– Different kinds of users can have different information
needs!
– The same user in different contexts can have different
information needs!
VS
06/11/2015
32
User Interface [3]
Recommender Systems and Information Extraction for researchers
• Solution: different lists!
• Provide multiple lists that satisfy different information needs
• More likely for a user to find something he is interested in
06/11/2015
33
Lesson learned
Recommender Systems and Information Extraction for researchers
• It’s not about the best algorithm, it’s about the entire
user experience!
• Easier (if you can) to put together different lists that
serve different information needs than to try to satisfy
every user with a single list
06/11/2015
34
Information Extraction
Recommender Systems and Information Extraction for researchers06/11/2015
35
Lots of content in an article
Recommender Systems and Information Extraction for researchers06/11/2015
36
Metadata Extraction
Recommender Systems and Information Extraction for researchers
• Metadata extraction from PDFs was one of the first features
of Mendeley
• It makes easy to organize your articles
• It powers

Mendeley catalog
06/11/2015
37
Citation Extraction
Recommender Systems and Information Extraction for researchers
• Citation extraction from any source and link to the
Mendeley catalog
• It extracts citable references and a narrative path in
the Mendeley environment
06/11/2015
38
Machine learning for

extraction
Recommender Systems and Information Extraction for researchers
•  Conditional Random Fields (CRF) [1] 
•  We label sequences of tokens yt given feature functions fk(yt, xt) 
•  E.g. ‘yt is AUTHOR and xt-1 is bold’ and ‘yt is AUTHOR and yt-1 is TITLE’
[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: probabilistic models for segmenting and
labeling sequence data. In ICML, 2001
Fig. 2.4 in Sutton & McCallum 2011 observations states
06/11/2015
39
What cites this work
Recommender Systems and Information Extraction for researchers06/11/2015
40
What cites this work [2]
Recommender Systems and Information Extraction for researchers06/11/2015
41
Mendeley Research Maps
Recommender Systems and Information Extraction for researchers
https://marcorossettiblog.wordpress.com/2015/07/05/mendeley-research-maps/
06/11/2015
42
Thank you
Recommender Systems and Information Extraction for researchers06/11/2015

Mais conteúdo relacionado

Mais procurados

HIM2030 Library Session
HIM2030 Library SessionHIM2030 Library Session
HIM2030 Library Session
tdueck
 
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
Kudos
 

Mais procurados (20)

CCE2060 Oct 2015
CCE2060 Oct 2015CCE2060 Oct 2015
CCE2060 Oct 2015
 
Altmetrics and visibility
Altmetrics and visibilityAltmetrics and visibility
Altmetrics and visibility
 
CST4599 Nov 2021
CST4599 Nov 2021CST4599 Nov 2021
CST4599 Nov 2021
 
SAT0100 Foundation Nov Dec 2017
SAT0100 Foundation Nov Dec 2017SAT0100 Foundation Nov Dec 2017
SAT0100 Foundation Nov Dec 2017
 
HIM2030 Library Session
HIM2030 Library SessionHIM2030 Library Session
HIM2030 Library Session
 
PDE session 3 Feb 16
PDE session 3 Feb 16PDE session 3 Feb 16
PDE session 3 Feb 16
 
BIS3400 Feb 2016
BIS3400 Feb 2016BIS3400 Feb 2016
BIS3400 Feb 2016
 
Mpirical CCM4901 Feb 2016
Mpirical CCM4901 Feb 2016Mpirical CCM4901 Feb 2016
Mpirical CCM4901 Feb 2016
 
BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021
 
PDE2440 Nov 2019
PDE2440 Nov 2019PDE2440 Nov 2019
PDE2440 Nov 2019
 
SES4041 Oct 2021
SES4041 Oct 2021SES4041 Oct 2021
SES4041 Oct 2021
 
Seven questions about ResearchGate
Seven questions about ResearchGateSeven questions about ResearchGate
Seven questions about ResearchGate
 
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
Regional Studies Association - Annual Meeting - Dublin 2017: increasing the r...
 
Mobilizing authors to promote their own content
Mobilizing authors to promote their own contentMobilizing authors to promote their own content
Mobilizing authors to promote their own content
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...
 
Elsevier-webcast
Elsevier-webcastElsevier-webcast
Elsevier-webcast
 
Enhancing the Visibility and Impact of Your Research
Enhancing the Visibility and Impact of Your ResearchEnhancing the Visibility and Impact of Your Research
Enhancing the Visibility and Impact of Your Research
 
Foundation Nov 2021
Foundation Nov 2021Foundation Nov 2021
Foundation Nov 2021
 
BIMM Sept 2021
BIMM Sept 2021BIMM Sept 2021
BIMM Sept 2021
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly Metrics
 

Semelhante a Recommender systems and information extraction for researchers

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
2014 Springer Author Workshop Lithuania
2014 Springer Author Workshop Lithuania2014 Springer Author Workshop Lithuania
2014 Springer Author Workshop Lithuania
Max Haring
 

Semelhante a Recommender systems and information extraction for researchers (20)

Research recommendations at Mendeley
Research recommendations at MendeleyResearch recommendations at Mendeley
Research recommendations at Mendeley
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscape
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
Managing active research in the University of Edinburgh
Managing active research in the University of EdinburghManaging active research in the University of Edinburgh
Managing active research in the University of Edinburgh
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it all
 
Open data in ubi systems research data management plan (part 4)
Open data in ubi systems research   data management plan (part 4)Open data in ubi systems research   data management plan (part 4)
Open data in ubi systems research data management plan (part 4)
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Altmetrics for Team Science
Altmetrics for Team ScienceAltmetrics for Team Science
Altmetrics for Team Science
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
 
2014 Springer Author Workshop Lithuania
2014 Springer Author Workshop Lithuania2014 Springer Author Workshop Lithuania
2014 Springer Author Workshop Lithuania
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
BLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, SymplecticBLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic
 
Data informed decision making - Yaz El Hakim
Data informed decision making - Yaz El HakimData informed decision making - Yaz El Hakim
Data informed decision making - Yaz El Hakim
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher education
 

Último

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Último (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

Recommender systems and information extraction for researchers

  • 1. Recommender Systems and
 Information Extraction for researchers Marco Rossetti @ross85 6/11/2015 Mendeley, London
  • 2. 2 Outline Recommender Systems and Information Extraction for researchers • What is Data Science • What is Mendeley • Recommender Systems at Mendeley • Information Extraction at Mendeley 06/11/2015
  • 3. 3 What is Data Science? Recommender Systems and Information Extraction for researchers06/11/2015
  • 4. 4 Why a data scientist? Recommender Systems and Information Extraction for researchers06/11/2015
  • 5. 5 Who want a data scientist? Recommender Systems and Information Extraction for researchers06/11/2015
  • 6. 6 Who want a data scientist? [2] Recommender Systems and Information Extraction for researchers06/11/2015
  • 7. 7 Two main types Recommender Systems and Information Extraction for researchers https://www.quora.com/What-is-data-science/answer/Michael-Hochster 06/11/2015
  • 8. 8 Two main types [2] Recommender Systems and Information Extraction for researchers https://www.quora.com/What-is-data-science/answer/Michael-Hochster 06/11/2015
  • 9. 9 Skills for Data Science Recommender Systems and Information Extraction for researchers http://businessoverbroadway.com/investigating-data-scientists-their-skills-and-team-makeup 06/11/2015
  • 10. 10 What is Mendeley Recommender Systems and Information Extraction for researchers06/11/2015
  • 11. 11 Mendeley builds tools to
 help researchers … [2] Recommender Systems and Information Extraction for researchers Read & Organize Search & Discover Collaborate & Network Experiment & Synthesize 06/11/2015
  • 12. 12 Read & Organize Recommender Systems and Information Extraction for researchers Reference management Cite-as-you- write Full-text article search Digitalised annotations 06/11/2015
  • 13. 13 Search & Discover Recommender Systems and Information Extraction for researchers Mendeley Suggest Literature Search Related Documents 06/11/2015
  • 14. 14 Collaborate & Network Recommender Systems and Information Extraction for researchers Research network Professional research groups 06/11/2015
  • 15. 15 Mendeley & Elsevier Recommender Systems and Information Extraction for researchers06/11/2015
  • 16. 16 Elsevier Products Recommender Systems and Information Extraction for researchers06/11/2015
  • 17. 17 Recommender Systems Recommender Systems and Information Extraction for researchers06/11/2015
  • 18. 18 What is a
 Recommender System? Recommender Systems and Information Extraction for researchers Recommender systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. [Wikipedia] 06/11/2015
  • 19. 19 Why Recommender Systems
 at Mendeley? Recommender Systems and Information Extraction for researchers Vision: “To build a personalised research advisor that helps you to organise your work, contextualise it within the global body of research, and connect you with relevant researchers and artifacts.” 06/11/2015
  • 20. 20 Recommender Systems
 at Mendeley – Related Documents Recommender Systems and Information Extraction for researchers06/11/2015
  • 21. 21 Recommender Systems
 at Mendeley – Mendeley Suggest Recommender Systems and Information Extraction for researchers https://www.mendeley.com/suggest/ 06/11/2015
  • 22. 22 Recommender System
 Components Recommender Systems and Information Extraction for researchers Algorithms Business Logic and Analytics User Experience Data Sources Algorithms Business Logic & Analytics User Interface 06/11/2015
  • 23. 23 Data Sources Recommender Systems and Information Extraction for researchers • Mendeley – User Libraries • What the users have in their libraries (what they read, what they annotate, what they highlight, what folders they have, etc. etc.) – Articles metadata (title, authors, abstract, keywords, tags, etc. etc.) – Groups • Scopus – Citation network • Science Direct – Logs • … 06/11/2015
  • 24. 24 Algorithms Recommender Systems and Information Extraction for researchers 1.  Collaborative filtering User-based If Alice read X, Y, Z and Bob read X, Y, Z and W, we recommend W to Alice + Work well for us because users << items - Only for users with enough articles in the library Item-based Users who read X also read Y + Item-item similarity matrix is useful to model last n articles read - Expensive in our setting (millions of items) 06/11/2015
  • 25. 25 Algorithms [2] Recommender Systems and Information Extraction for researchers 1.  Collaborative filtering (still) Matrix factorization + Best CF model in literature - Generate recommendations on a catalog of million of items is too slow 1 1 1 1 1 1 ? ? 1 ? 1 ? 1 1 1 1 1 1 1 1 U n x k V k x m X n x m X ≈ 06/11/2015
  • 26. 26 Algorithms [3] Recommender Systems and Information Extraction for researchers 2.  Content-based I read articles about text mining, show me other stuff about text mining + Good for cold users (users without data) - Overspecialisation: items recommended are too similar 3.  Popularity/Trending I work in Computer Science, show me popular/trending articles in Computer Science + Perfect for cold users - Non personalised, discipline too broad 06/11/2015
  • 27. 27 Algorithms [4] Recommender Systems and Information Extraction for researchers 4.  Citation Network § Articles similar to articles I cited § Articles that cite me § Articles from my co-author + Good for some kind of users - Young researchers do not have (enough) publications 06/11/2015
  • 28. 28 Evaluation Recommender Systems and Information Extraction for researchers • Offline Evaluation of 100+ algorithms variations on an historical dataset • Split data into training and testing based on timestamps: train until day X, try to predict what users will add in the next day/week/month • Computed different metrics to measure different dimensions: • Accuracy (precision, recall, f-score, nDCG, MAP) • Diversity • Recency • Popularity • Consistency • Coverage • Online Evaluation computing CTR on logs data • Do offline and online correlate? 06/11/2015
  • 29. 29 Business Logic / Analytics Recommender Systems and Information Extraction for researchers • Business put some constraints that could have an impact on the recommendation experience – Don’t show articles outside the user discipline – Show articles only with a minimum readership – Show only recommendations that you can explain (especially for people recommendations, a different matter) • Analytics – Dashboard on the recommender statistics: • Number of recommendations served • Number of users with recommendations • … 06/11/2015
  • 30. 30 User Interface Recommender Systems and Information Extraction for researchers • Original idea: One list fits all Create a single list with the best recommendations for the user: use advanced methods to take into account every signal and provide what is best for you! 06/11/2015
  • 31. 31 User Interface [2] Recommender Systems and Information Extraction for researchers • However… – Different kinds of users can have different information needs! – The same user in different contexts can have different information needs! VS 06/11/2015
  • 32. 32 User Interface [3] Recommender Systems and Information Extraction for researchers • Solution: different lists! • Provide multiple lists that satisfy different information needs • More likely for a user to find something he is interested in 06/11/2015
  • 33. 33 Lesson learned Recommender Systems and Information Extraction for researchers • It’s not about the best algorithm, it’s about the entire user experience! • Easier (if you can) to put together different lists that serve different information needs than to try to satisfy every user with a single list 06/11/2015
  • 34. 34 Information Extraction Recommender Systems and Information Extraction for researchers06/11/2015
  • 35. 35 Lots of content in an article Recommender Systems and Information Extraction for researchers06/11/2015
  • 36. 36 Metadata Extraction Recommender Systems and Information Extraction for researchers • Metadata extraction from PDFs was one of the first features of Mendeley • It makes easy to organize your articles • It powers
 Mendeley catalog 06/11/2015
  • 37. 37 Citation Extraction Recommender Systems and Information Extraction for researchers • Citation extraction from any source and link to the Mendeley catalog • It extracts citable references and a narrative path in the Mendeley environment 06/11/2015
  • 38. 38 Machine learning for
 extraction Recommender Systems and Information Extraction for researchers •  Conditional Random Fields (CRF) [1] •  We label sequences of tokens yt given feature functions fk(yt, xt) •  E.g. ‘yt is AUTHOR and xt-1 is bold’ and ‘yt is AUTHOR and yt-1 is TITLE’ [1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In ICML, 2001 Fig. 2.4 in Sutton & McCallum 2011 observations states 06/11/2015
  • 39. 39 What cites this work Recommender Systems and Information Extraction for researchers06/11/2015
  • 40. 40 What cites this work [2] Recommender Systems and Information Extraction for researchers06/11/2015
  • 41. 41 Mendeley Research Maps Recommender Systems and Information Extraction for researchers https://marcorossettiblog.wordpress.com/2015/07/05/mendeley-research-maps/ 06/11/2015
  • 42. 42 Thank you Recommender Systems and Information Extraction for researchers06/11/2015