SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Recommendation
Engines for Scientific
           Literature




       Kris Jack, PhD
    Data Mining Team Lead
Summary

➔
    2 recommendation use cases

➔
    literature search with Mendeley

➔
    use case 1: related research

➔
    use case 2: personalised recommendations
Use Cases

Two types of     1) Related Research
                 ●
                   given 1 research article
recommendation   ●
                   find other related articles
use cases:


                 2) Personalised Recommendations
                 ●
                   given a user's profile (e.g. interests)
                 ●
                   find articles of interest to them
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
Use Cases

My secondment   1) Related Research
                ●
                  given 1 research article
(Dec-Feb):      ●
                  find other related articles




                2) Personalised Recommendations
                ●
                  given a user's profile (e.g. interests)
                ●
                  find articles of interest to them
Literature Search
 Using Mendeley


Challenge!
 ●
     Use only Mendeley to perform literature search for:
     ●
       Related research
     ●
       Personalised recommendations




                                                 Eating your 
                                                 own dog food...
Found:    Queries: “content similarity”, “semantic
         similarity”, “semantic relatedness”, “PubMed
 0       related articles”, “Google Scholar related articles”
Found:    Queries: “content similarity”, “semantic
         similarity”, “semantic relatedness”, “PubMed
  1      related articles”, “Google Scholar related articles”
Found:

  1
Found:

 2
Found:

 4
Found:

 4
Literature Search
    Using Mendeley


    Summary of Results

   Strategy                Num Docs Comment
                           Found
   Catalogue Search        19      9 from “Related Research”
   Group Search            0       Needs work
   Perso Recommendations   45      Led to a group with 37 docs!




Found:

64
Literature Search
    Using Mendeley


    Summary of Results

   Strategy                Num Docs Comment
                           Found
   Catalogue Search        19      9 from “Related Research”
   Group Search            0       Needs work
   Perso Recommendations   45      Led to a group with 37 docs!


                                                    Eating your 
Found:                                              own dog food...
                                                      Tastes good!
64
64 => 31 docs, read 14 so far,
   so what do they say...?
Use Cases

            1) Related Research
            ●
              given 1 research article
            ●
              find other related articles
Use Case 1: Related Research

 7 highly relevant papers (related research for scientific articles)

 Q1/4: How are the systems evaluated?


     User study (e.g. Likert scale to rate relatedness
     between documents). (Beel & Gipp, 2010)

     TREC collections with hand classified 'related
     articles' (e.g. TREC 2005 genomics track). (Lin &
     Wilbur, 2007)

     Try to reconstruct a document's reference list
     (Pohl, Radlinski, & Joachims, 2007; Vellino, 2009)
Use Case 1: Related Research

 7 highly relevant papers (related research for scientific articles)

 Q2/4: How are the systems trained?


     Paper reference lists (Pohl et al., 2007; Vellino, 2009)

     Usage data (e.g. PubMed, arXiv) (Lin & Wilbur, 2007)

     Document content (e.g. metadata, co-citation,
     bibliographic coupling) (Gipp, Beel, & Hentschel, 2009)

     Collocation in mind maps (Jöran Beel & Gipp, 2010)
Use Case 1: Related Research

 7 highly relevant papers (related research for scientific articles)

 Q3/4: Which techniques are applied?


     bm25 (Lin & Wilbur, 2007)

     Topic modelling (Lin & Wilbur, 2007)

     Collaborative filtering (Pohl et al., 2007)

     Bespoke heuristics for feature extraction (e.g. in-text
     citation metrics for same sentence, paragraph). (Pohl et
     al., 2007; Gipp et al., 2009)
Use Case 1: Related Research

 7 highly relevant papers (related research for scientific articles)

 Q4/4: Which techniques have most success?


     Topic modelling slighty improves on BM25 (MEDLINE
     abstracts) (Lin & Wilbur, 2007):
     - bm25 = 0.383 precision @ 5
     - PMRA = 0.399 precision @ 5

     Seeding CF with usage data from arXiv won out over
     using citation lists (Pohl et al., 2007)

     Not yet found significant results that show content-
     based or CF methods are better for this task
Use Case 1: Related Research

 Progress so far...

 Q1/2 How do we evaluate our system?


     Construct a non-complex data set of related research:
     ●
       include groups with 10-20 documents (i.e. topics)
     ●
       no overlaps between groups (i.e. documents in common)
     ●
       only take documents that are recognised as being in English
     ●
       document metadata must be 'complete' (i.e. has title, year, author,
     published in, abstract, filehash, abstract, tags/keywords/MeSH terms)

     → 4,382 groups
     → mean size = 14
     → 60,715 individual documents

     Given a doc, aim to retrieve the other docs from its group
     ●
       tf-idf with lucene implementation
Use Case 1: Related Research

 Progress so far...

 Q1/2 How do we evaluate our system?


     Construct a non-complex data set of related research:
     ●
       include groups with 10-20 documents (i.e. topics)
     ●
       no overlaps between groups (i.e. documents in common)
     ●
       only take documents that are recognised as being in English
     ●
       document metadata must be 'complete' (i.e. has title, year, author,
     published in, abstract, filehash, abstract, tags/keywords/MeSH terms)

     → 4,382 groups
     → mean size = 14
     → 60,715 individual documents

     Given a doc, aim to retrieve the other docs from its group
     ●
       tf-idf with lucene implementation
Use Case 1: Related Research

 Progress so far...                                                 Metadata Presence in Documents

                                         100.00%
 Q1/2 How do we evaluate our system?
    90.00%

                                          80.00%

                                          70.00%
                                         Construct a non-complex data set of related research:
  % of documents that field appears in




                                           60.00%
                                         ●
                                           include groups with 10-20 documents (i.e. topics)
                                           50.00%
                                                                                                          Evaluation Data Det
                                         ●
                                           no overlaps between groups (i.e. documents in common)          Group
                                           40.00%
                                         ●
                                           only take documents that are recognised as being in English Catalogue
                                           30.00%
                                         ●
                                           document metadata must be 'complete' (i.e. has title, year, author,
                                         published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
                                           20.00%

                                          10.00%

                                         → 4,382 groups
                                         0.00%
                                                   title



                                                           year



                                                                  author



                                                                           publishedIn



                                                                                         fileHash



                                                                                                    abstract



                                                                                                               generalKeyword



                                                                                                                                meshTerms



                                                                                                                                            keywords



                                                                                                                                                       tags
                                         → mean size = 14
                                         → 60,715 individual documents

                                         Given a doc, aim to retrieve thefield
                                                                   metadata other docs from its group
Use Case 1: Related Research

 Progress so far...

 Q2/2 What are our results?


                               tf-idf Precision per Field for Complete Data Set

                              0.3

                             0.25

                              0.2
             Precision @ 5




                             0.15

                              0.1

                             0.05

                               0
                                               title                mesh-term             keyword
                                    abstract           generalKeyword            author             tag
                                                                metadata field
Use Case 1: Related Research

 Progress so far...

 Q2/2 What are our results?


                              tf-idf Precision per Field when Field is Available

                              0.5
                             0.45
                              0.4
                             0.35
             Precision @ 5




                              0.3
                             0.25
                              0.2
                             0.15
                              0.1
                             0.05
                                0
                                    tag   abstract   mesh-term     title general-keyword author   keyword
                                                           metadata field
Use Case 1: Related Research

 Progress so far...

 Q2/2 What are our results?

               tf-idf Precision for Field Combos for Complete Data Set

                              0.4

                             0.35

                              0.3

                             0.25
             precision @ 5




                              0.2

                             0.15

                              0.1

                             0.05

                               0
                                                abstract           generalKeyword       author             tag
                                    bestCombo              title              mesh-term          keyword
                                                                     metadata field(s)




                                                   BestCombo = abstract+author+general-keyword+tag+title
Use Case 1: Related Research

 Progress so far...

 Q2/2 What are our results?

             tf-idf Precision for Field Combos when Field is Available

                              0.5
                             0.45
                              0.4
                             0.35
                              0.3
             precision @ 5




                             0.25
                              0.2
                             0.15
                              0.1
                             0.05
                               0
                                          bestCombo              mesh-term              general-keyword         keyword
                                    tag               abstract                  title                  author
                                                                 metadata field(s)



                                             BestCombo = abstract+author+general-keyword+tag+title
Use Case 1: Related Research

 Future directions...?

 Evaluate multiple techniques on same data set


    Construct public data set
    ●
      similar to current one but with data from only public groups
    ●
      analyse composition of data set in detail

    Train:
    ●
      content-based filtering
    ●
      collaborative filtering
    ●
      hybrid

    Evaluate the different systems on same data set

    ...and let's brainstorm!
Use Cases




            2) Personalised Recommendations
            ●
              given a user's profile (e.g. interests)
            ●
              find articles of interest to them
Use Case 2: Perso Recommendations

 7 highly relevant papers (perso recs for scientific articles)

 Q1/4: How are the systems evaluated?


     Cross validation on user libraries (Bogers & van
     Den Bosch, 2009; Wang & Blei, 2011)

     User studies (McNee, Kapoor, & Konstan, 2006;
     Parra-Santander & Brusilovsky, 2009)
Use Case 2: Perso Recommendations

 7 highly relevant papers (perso recs for scientific articles)

 Q2/4: How are the systems trained?


     CiteULike libraries (Bogers & van Den Bosch,
     2009; Parra-Santander & Brusilovsky, 2009;
     Wang & Blei, 2011)

     Documents represent users and their citations
     documents of interest (McNee et al., 2006)

     User search history (N Kapoor et al., 2007)
Use Case 2: Perso Recommendations

 7 highly relevant papers (perso recs for scientific articles)

 Q3/4: Which techniques are applied?


     CF (Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)

     LDA (Wang & Blei, 2011)

     Hybrid of CF + LDA (Wang & Blei, 2011)

     BM25 over tags to form user neighbourhood (Parra-Santander &
     Brusilovsky, 2009)

     Item-based and content-based CF (Bogers & van Den Bosch, 2009)

     User-based CF, Naïve Bayes classifier, Probabilistic Latent Semantic
     Indexing, textual TF-IDF-based algorithm (uses document abstracts)
     (McNee et al., 2006)
Use Case 2: Perso Recommendations

 7 highly relevant papers (perso recs for scientific articles)

 Q4/4: Which techniques have most success?


     CF is much better than topic modelling (Wang & Blei, 2011)

     CF-topic modelling hybrid, slightly outperforms CF alone (Wang &
     Blei, 2011)

     Content-based filtering performed slightly better than item-based
     filtering on a test set with 1,322 CiteULike users (Bogers & van Den
     Bosch, 2009)

     User-based CF and tf-idf outperformed Naïve Bayes and Probabilistic
     Latent Semantic Indexing significantly (McNee et al., 2006)

     BM25 gave better results than CF but the study was with just 7
     CiteULike users so small scale (Parra-Santander & Brusilovsky, 2009)
Use Case 2: Perso Recommendations

 7 highly relevant papers (perso recs for scientific articles)

 Q4/4: Which techniques have most success?



                               Advantage                          Disadvantage
 Content-   Human readable form of their profile                  Tends to over-
 based                                                            specialise
            Quickly absorb new content without need for ratings
 CF         Works on an abstract item-user level so you don't     Requires a lot of
            need to 'understand' the content                      data

            Tends to give more novel and creative
            recommendations
Use Case 2: Perso Recommendations

 Our progress so far...

 Q1/2 How do we evaluate our system?


    Construct an evaluation data set from user libraries
    ●
      50,000 user libraries
    ●
      10-fold cross validation
    ●
      libraries vary from 20-500 documents
    ●
      preference values are binary (in library = 1; 0 otherwise)

    Train:
    ●
      item-based collaborative filtering recommender

    Evaluate:
    ●
      train recommender and test how well it can reconstruct the users'
    hidden testing libraries
    ●
      mulitple similarity metrics (e.g. cooccurrence, loglikelihood)
Use Case 2: Perso Recommendations

 Our progress so far...

 Q2/2 What are our results?


    Cross validation:
    ●
      0.1 precision @ 10 articles

    Usage logs:
    ●
      0.4 precision @ 10 articles
Use Case 2: Perso Recommendations

 Our progress so far...

 Q2/2 What are our results?
Use Case 2: Perso Recommendations

 Our progress so far...

 Q2/2 What are our results?
     Precision at 10 articles




                                Number of articles in user library
Use Case 2: Perso Recommendations

 Future directions...?

 Evaluate multiple techniques
 Q2/2 What are our results? on same data set


    Construct data set
    ●
      similar to current one but with more up-to-date data
    ●
      analyse composition of data set in detail

    Train:
    ●
      content-based filtering
    ●
      collaborative filtering (user-based and item-based)
    ●
      hybrid

    Evaluate the different systems on same data set

    ...and let's brainstorm!
Conclusion

➔
    2 recommendation use cases

➔
    similar problems and techniques

➔
    good results so far

➔
  combining CF with content would likely
improve both
www.mendeley.com
References

Beel, Jöran, & Gipp, B. (2010). Link Analysis in Mind Maps  : A New Approach to Determining Document Relatedness.
Mind,            (January).            Citeseer.       Retrieved              from          http://scholar.google.com/scholar?
hl=en&btnG=Search&q=intitle:Link+Analysis+in+Mind+Maps+:
+A+New+Approach+to+Determining+Document+Relatedness#0
Bogers, T., & van Den Bosch, A. (2009). Collaborative and Content-based Filtering for Item Recommendation on Social
Bookmarking Websites. ACM RecSys ’09 Workshop on Recommender Systems and the Social Web. New York, USA.
Retrieved from http://ceur-ws.org/Vol-532/paper2.pdf
Gipp, B., Beel, J., & Hentschel, C. (2009). Scienstein: A research paper recommender system. Proceedings of the
International Conference on Emerging Trends in Computing (ICETiC’09) (pp. 309–315). Retrieved from
http://www.sciplore.org/publications/2009-Scienstein_-_A_Research_Paper_Recommender_System.pdf
Kapoor, N, Chen, J., Butler, J. T., Fouty, G. C., Stemper, J. A., Riedl, J., & Konstan, J. A. (2007). Techlens: a researcher’s
desktop. Proceedings of the 2007 ACM conference on Recommender systems (pp. 183-184). ACM.
doi:10.1145/1297231.1297268
Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC
Bioinformatics, 8(1), 423. BioMed Central. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17971238
McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: avoiding pitfalls when recommending research
papers. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (p. 180). ACM.
Retrieved from http://portal.acm.org/citation.cfm?id=1180875.1180903
Parra-Santander, D., & Brusilovsky, P. (2009). Evaluation of Collaborative Filtering Algorithms for Recommending Articles.
Web 3.0: Merging Semantic Web and Social Web at HyperText ’09 (pp. 3-6). Torino, Italy. Retrieved from http://ceur-
ws.org/Vol-467/paper5.pdf
Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records.
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 418-419). ACM. Retrieved from
http://portal.acm.org/citation.cfm?id=1255175.1255260
Vellino, A. (2009). The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles. Retrieved
from http://cuvier.cisti.nrc.ca/~vellino/documents/PageRankRecommender-Vellino2008.pdf
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th
ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448–456). ACM. Retrieved from
http://dl.acm.org/citation.cfm?id=2020480

Mais conteúdo relacionado

Mais procurados

NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...aschwarzman
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisDr. Yaar Muhammad
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Scientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINScientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINDermitder
 
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...Shalin Hai-Jew
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivoMarieke Guy
 

Mais procurados (7)

NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
 
From federated to aggregated search
From federated to aggregated searchFrom federated to aggregated search
From federated to aggregated search
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysis
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Scientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINScientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPIN
 
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 

Destaque

Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)SocialMediaMining
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation systemPranav Prakash
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Content Recommendation Based on Data Mining in Adaptive Social Networks
Content Recommendation Based on Data Mining  in Adaptive Social NetworksContent Recommendation Based on Data Mining  in Adaptive Social Networks
Content Recommendation Based on Data Mining in Adaptive Social NetworksMarcel Caraciolo
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Destaque (9)

Music data mining
Music  data miningMusic  data mining
Music data mining
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Content Recommendation Based on Data Mining in Adaptive Social Networks
Content Recommendation Based on Data Mining  in Adaptive Social NetworksContent Recommendation Based on Data Mining  in Adaptive Social Networks
Content Recommendation Based on Data Mining in Adaptive Social Networks
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Semelhante a Recommendation Engines for Scientific Literature

W13 libr250 databases___sources1
W13 libr250 databases___sources1W13 libr250 databases___sources1
W13 libr250 databases___sources1lterrones
 
Research report writing
Research report writingResearch report writing
Research report writingMichele Knobel
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularlterrones
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
 
British Library
British LibraryBritish Library
British Libraryclarivate
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health SciencesRobin Featherstone
 
كيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيكيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيresearchcenterm
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularlterrones
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401Traciwm
 
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG: connecting the knowledge community
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review articlePreethiT4
 
Opportunities: Improve Interoperability ... from a library viewpoint.
Opportunities: Improve Interoperability ... from a library viewpoint. Opportunities: Improve Interoperability ... from a library viewpoint.
Opportunities: Improve Interoperability ... from a library viewpoint. TIB Hannover
 
IntroductionTypically, a review of the literature is extensi
IntroductionTypically, a review of the literature is extensiIntroductionTypically, a review of the literature is extensi
IntroductionTypically, a review of the literature is extensimariuse18nolet
 
Education_selecting key discovery tools for education research_v1_2021.pptx
Education_selecting key discovery tools for education research_v1_2021.pptxEducation_selecting key discovery tools for education research_v1_2021.pptx
Education_selecting key discovery tools for education research_v1_2021.pptxShivamChaturvedi67
 

Semelhante a Recommendation Engines for Scientific Literature (20)

W13 libr250 databases___sources1
W13 libr250 databases___sources1W13 libr250 databases___sources1
W13 libr250 databases___sources1
 
ENS/OCN 3911 Preparation for Field Projects
ENS/OCN 3911 Preparation for Field ProjectsENS/OCN 3911 Preparation for Field Projects
ENS/OCN 3911 Preparation for Field Projects
 
Research report writing
Research report writingResearch report writing
Research report writing
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popular
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
British Library
British LibraryBritish Library
British Library
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health Sciences
 
محاضرة 2
محاضرة 2محاضرة 2
محاضرة 2
 
كيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبيكيفية كتابة المسح الأدبي
كيفية كتابة المسح الأدبي
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popular
 
PPT on literature review.pdf
PPT on literature review.pdfPPT on literature review.pdf
PPT on literature review.pdf
 
PSYC 3401
PSYC 3401PSYC 3401
PSYC 3401
 
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
Guidelines review article
Guidelines review articleGuidelines review article
Guidelines review article
 
PsycInfo from ProQuest
PsycInfo from ProQuestPsycInfo from ProQuest
PsycInfo from ProQuest
 
Opportunities: Improve Interoperability ... from a library viewpoint.
Opportunities: Improve Interoperability ... from a library viewpoint. Opportunities: Improve Interoperability ... from a library viewpoint.
Opportunities: Improve Interoperability ... from a library viewpoint.
 
IntroductionTypically, a review of the literature is extensi
IntroductionTypically, a review of the literature is extensiIntroductionTypically, a review of the literature is extensi
IntroductionTypically, a review of the literature is extensi
 
Education_selecting key discovery tools for education research_v1_2021.pptx
Education_selecting key discovery tools for education research_v1_2021.pptxEducation_selecting key discovery tools for education research_v1_2021.pptx
Education_selecting key discovery tools for education research_v1_2021.pptx
 

Mais de Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 

Mais de Kris Jack (16)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 

Último

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 

Último (20)

Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 

Recommendation Engines for Scientific Literature

  • 1. Recommendation Engines for Scientific Literature Kris Jack, PhD Data Mining Team Lead
  • 2. Summary ➔ 2 recommendation use cases ➔ literature search with Mendeley ➔ use case 1: related research ➔ use case 2: personalised recommendations
  • 3. Use Cases Two types of 1) Related Research ● given 1 research article recommendation ● find other related articles use cases: 2) Personalised Recommendations ● given a user's profile (e.g. interests) ● find articles of interest to them
  • 6. Use Cases My secondment 1) Related Research ● given 1 research article (Dec-Feb): ● find other related articles 2) Personalised Recommendations ● given a user's profile (e.g. interests) ● find articles of interest to them
  • 7. Literature Search Using Mendeley Challenge! ● Use only Mendeley to perform literature search for: ● Related research ● Personalised recommendations Eating your  own dog food...
  • 8. Found: Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed 0 related articles”, “Google Scholar related articles”
  • 9. Found: Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed 1 related articles”, “Google Scholar related articles”
  • 14. Literature Search Using Mendeley Summary of Results Strategy Num Docs Comment Found Catalogue Search 19 9 from “Related Research” Group Search 0 Needs work Perso Recommendations 45 Led to a group with 37 docs! Found: 64
  • 15. Literature Search Using Mendeley Summary of Results Strategy Num Docs Comment Found Catalogue Search 19 9 from “Related Research” Group Search 0 Needs work Perso Recommendations 45 Led to a group with 37 docs! Eating your  Found: own dog food...   Tastes good! 64
  • 16. 64 => 31 docs, read 14 so far, so what do they say...?
  • 17. Use Cases 1) Related Research ● given 1 research article ● find other related articles
  • 18. Use Case 1: Related Research 7 highly relevant papers (related research for scientific articles) Q1/4: How are the systems evaluated? User study (e.g. Likert scale to rate relatedness between documents). (Beel & Gipp, 2010) TREC collections with hand classified 'related articles' (e.g. TREC 2005 genomics track). (Lin & Wilbur, 2007) Try to reconstruct a document's reference list (Pohl, Radlinski, & Joachims, 2007; Vellino, 2009)
  • 19. Use Case 1: Related Research 7 highly relevant papers (related research for scientific articles) Q2/4: How are the systems trained? Paper reference lists (Pohl et al., 2007; Vellino, 2009) Usage data (e.g. PubMed, arXiv) (Lin & Wilbur, 2007) Document content (e.g. metadata, co-citation, bibliographic coupling) (Gipp, Beel, & Hentschel, 2009) Collocation in mind maps (Jöran Beel & Gipp, 2010)
  • 20. Use Case 1: Related Research 7 highly relevant papers (related research for scientific articles) Q3/4: Which techniques are applied? bm25 (Lin & Wilbur, 2007) Topic modelling (Lin & Wilbur, 2007) Collaborative filtering (Pohl et al., 2007) Bespoke heuristics for feature extraction (e.g. in-text citation metrics for same sentence, paragraph). (Pohl et al., 2007; Gipp et al., 2009)
  • 21. Use Case 1: Related Research 7 highly relevant papers (related research for scientific articles) Q4/4: Which techniques have most success? Topic modelling slighty improves on BM25 (MEDLINE abstracts) (Lin & Wilbur, 2007): - bm25 = 0.383 precision @ 5 - PMRA = 0.399 precision @ 5 Seeding CF with usage data from arXiv won out over using citation lists (Pohl et al., 2007) Not yet found significant results that show content- based or CF methods are better for this task
  • 22. Use Case 1: Related Research Progress so far... Q1/2 How do we evaluate our system? Construct a non-complex data set of related research: ● include groups with 10-20 documents (i.e. topics) ● no overlaps between groups (i.e. documents in common) ● only take documents that are recognised as being in English ● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms) → 4,382 groups → mean size = 14 → 60,715 individual documents Given a doc, aim to retrieve the other docs from its group ● tf-idf with lucene implementation
  • 23. Use Case 1: Related Research Progress so far... Q1/2 How do we evaluate our system? Construct a non-complex data set of related research: ● include groups with 10-20 documents (i.e. topics) ● no overlaps between groups (i.e. documents in common) ● only take documents that are recognised as being in English ● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms) → 4,382 groups → mean size = 14 → 60,715 individual documents Given a doc, aim to retrieve the other docs from its group ● tf-idf with lucene implementation
  • 24. Use Case 1: Related Research Progress so far... Metadata Presence in Documents 100.00% Q1/2 How do we evaluate our system? 90.00% 80.00% 70.00% Construct a non-complex data set of related research: % of documents that field appears in 60.00% ● include groups with 10-20 documents (i.e. topics) 50.00% Evaluation Data Det ● no overlaps between groups (i.e. documents in common) Group 40.00% ● only take documents that are recognised as being in English Catalogue 30.00% ● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms) 20.00% 10.00% → 4,382 groups 0.00% title year author publishedIn fileHash abstract generalKeyword meshTerms keywords tags → mean size = 14 → 60,715 individual documents Given a doc, aim to retrieve thefield metadata other docs from its group
  • 25. Use Case 1: Related Research Progress so far... Q2/2 What are our results? tf-idf Precision per Field for Complete Data Set 0.3 0.25 0.2 Precision @ 5 0.15 0.1 0.05 0 title mesh-term keyword abstract generalKeyword author tag metadata field
  • 26. Use Case 1: Related Research Progress so far... Q2/2 What are our results? tf-idf Precision per Field when Field is Available 0.5 0.45 0.4 0.35 Precision @ 5 0.3 0.25 0.2 0.15 0.1 0.05 0 tag abstract mesh-term title general-keyword author keyword metadata field
  • 27. Use Case 1: Related Research Progress so far... Q2/2 What are our results? tf-idf Precision for Field Combos for Complete Data Set 0.4 0.35 0.3 0.25 precision @ 5 0.2 0.15 0.1 0.05 0 abstract generalKeyword author tag bestCombo title mesh-term keyword metadata field(s) BestCombo = abstract+author+general-keyword+tag+title
  • 28. Use Case 1: Related Research Progress so far... Q2/2 What are our results? tf-idf Precision for Field Combos when Field is Available 0.5 0.45 0.4 0.35 0.3 precision @ 5 0.25 0.2 0.15 0.1 0.05 0 bestCombo mesh-term general-keyword keyword tag abstract title author metadata field(s) BestCombo = abstract+author+general-keyword+tag+title
  • 29. Use Case 1: Related Research Future directions...? Evaluate multiple techniques on same data set Construct public data set ● similar to current one but with data from only public groups ● analyse composition of data set in detail Train: ● content-based filtering ● collaborative filtering ● hybrid Evaluate the different systems on same data set ...and let's brainstorm!
  • 30. Use Cases 2) Personalised Recommendations ● given a user's profile (e.g. interests) ● find articles of interest to them
  • 31. Use Case 2: Perso Recommendations 7 highly relevant papers (perso recs for scientific articles) Q1/4: How are the systems evaluated? Cross validation on user libraries (Bogers & van Den Bosch, 2009; Wang & Blei, 2011) User studies (McNee, Kapoor, & Konstan, 2006; Parra-Santander & Brusilovsky, 2009)
  • 32. Use Case 2: Perso Recommendations 7 highly relevant papers (perso recs for scientific articles) Q2/4: How are the systems trained? CiteULike libraries (Bogers & van Den Bosch, 2009; Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011) Documents represent users and their citations documents of interest (McNee et al., 2006) User search history (N Kapoor et al., 2007)
  • 33. Use Case 2: Perso Recommendations 7 highly relevant papers (perso recs for scientific articles) Q3/4: Which techniques are applied? CF (Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011) LDA (Wang & Blei, 2011) Hybrid of CF + LDA (Wang & Blei, 2011) BM25 over tags to form user neighbourhood (Parra-Santander & Brusilovsky, 2009) Item-based and content-based CF (Bogers & van Den Bosch, 2009) User-based CF, Naïve Bayes classifier, Probabilistic Latent Semantic Indexing, textual TF-IDF-based algorithm (uses document abstracts) (McNee et al., 2006)
  • 34. Use Case 2: Perso Recommendations 7 highly relevant papers (perso recs for scientific articles) Q4/4: Which techniques have most success? CF is much better than topic modelling (Wang & Blei, 2011) CF-topic modelling hybrid, slightly outperforms CF alone (Wang & Blei, 2011) Content-based filtering performed slightly better than item-based filtering on a test set with 1,322 CiteULike users (Bogers & van Den Bosch, 2009) User-based CF and tf-idf outperformed Naïve Bayes and Probabilistic Latent Semantic Indexing significantly (McNee et al., 2006) BM25 gave better results than CF but the study was with just 7 CiteULike users so small scale (Parra-Santander & Brusilovsky, 2009)
  • 35. Use Case 2: Perso Recommendations 7 highly relevant papers (perso recs for scientific articles) Q4/4: Which techniques have most success? Advantage Disadvantage Content- Human readable form of their profile Tends to over- based specialise Quickly absorb new content without need for ratings CF Works on an abstract item-user level so you don't Requires a lot of need to 'understand' the content data Tends to give more novel and creative recommendations
  • 36. Use Case 2: Perso Recommendations Our progress so far... Q1/2 How do we evaluate our system? Construct an evaluation data set from user libraries ● 50,000 user libraries ● 10-fold cross validation ● libraries vary from 20-500 documents ● preference values are binary (in library = 1; 0 otherwise) Train: ● item-based collaborative filtering recommender Evaluate: ● train recommender and test how well it can reconstruct the users' hidden testing libraries ● mulitple similarity metrics (e.g. cooccurrence, loglikelihood)
  • 37. Use Case 2: Perso Recommendations Our progress so far... Q2/2 What are our results? Cross validation: ● 0.1 precision @ 10 articles Usage logs: ● 0.4 precision @ 10 articles
  • 38. Use Case 2: Perso Recommendations Our progress so far... Q2/2 What are our results?
  • 39. Use Case 2: Perso Recommendations Our progress so far... Q2/2 What are our results? Precision at 10 articles Number of articles in user library
  • 40. Use Case 2: Perso Recommendations Future directions...? Evaluate multiple techniques Q2/2 What are our results? on same data set Construct data set ● similar to current one but with more up-to-date data ● analyse composition of data set in detail Train: ● content-based filtering ● collaborative filtering (user-based and item-based) ● hybrid Evaluate the different systems on same data set ...and let's brainstorm!
  • 41. Conclusion ➔ 2 recommendation use cases ➔ similar problems and techniques ➔ good results so far ➔ combining CF with content would likely improve both
  • 43. References Beel, Jöran, & Gipp, B. (2010). Link Analysis in Mind Maps  : A New Approach to Determining Document Relatedness. Mind, (January). Citeseer. Retrieved from http://scholar.google.com/scholar? hl=en&btnG=Search&q=intitle:Link+Analysis+in+Mind+Maps+: +A+New+Approach+to+Determining+Document+Relatedness#0 Bogers, T., & van Den Bosch, A. (2009). Collaborative and Content-based Filtering for Item Recommendation on Social Bookmarking Websites. ACM RecSys ’09 Workshop on Recommender Systems and the Social Web. New York, USA. Retrieved from http://ceur-ws.org/Vol-532/paper2.pdf Gipp, B., Beel, J., & Hentschel, C. (2009). Scienstein: A research paper recommender system. Proceedings of the International Conference on Emerging Trends in Computing (ICETiC’09) (pp. 309–315). Retrieved from http://www.sciplore.org/publications/2009-Scienstein_-_A_Research_Paper_Recommender_System.pdf Kapoor, N, Chen, J., Butler, J. T., Fouty, G. C., Stemper, J. A., Riedl, J., & Konstan, J. A. (2007). Techlens: a researcher’s desktop. Proceedings of the 2007 ACM conference on Recommender systems (pp. 183-184). ACM. doi:10.1145/1297231.1297268 Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics, 8(1), 423. BioMed Central. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17971238 McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: avoiding pitfalls when recommending research papers. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (p. 180). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1180875.1180903 Parra-Santander, D., & Brusilovsky, P. (2009). Evaluation of Collaborative Filtering Algorithms for Recommending Articles. Web 3.0: Merging Semantic Web and Social Web at HyperText ’09 (pp. 3-6). Torino, Italy. Retrieved from http://ceur- ws.org/Vol-467/paper5.pdf Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 418-419). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1255175.1255260 Vellino, A. (2009). The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles. Retrieved from http://cuvier.cisti.nrc.ca/~vellino/documents/PageRankRecommender-Vellino2008.pdf Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448–456). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2020480