SlideShare a Scribd company logo
1 of 27
Discovering the World’s Research


                    Ron Snyder
 Director of Advanced Technology, ITHAKA/JSTOR
         NASIG Annual Conference - 2012
                   June 9, 2012
Who we are



  ITHAKA is a not-for-profit organization that helps the academic
  community use digital technologies to preserve the scholarly
  record and to advance research and teaching in sustainable ways.

  We pursue this mission by providing innovative services that aid in
  the adoption of these technologies and that create lasting impact.



        JSTOR is a research platform that enables
        discovery, access, and preservation of scholarly
        content.
JSTOR Factoids


  •   Started in 1997
  •   Journals online: 1,604
  •   Articles online: 7.5 million
  •   Disciplines covered: 60
  •   Participating institutions: 7,800
  •   Countries with participating institutions: 167
JSTOR site activity

                      User Sessions (visits)
                      » New Sessions (per hour):
                           70k peak, 38k average
                      » Simultaneous Sessions:
                           44k peak, 21k average
                      Page Views
                      » 3.5M per day, 6.7M peak
                      Content Accesses
                      » 430k per day, 850K peak
                      Searches
                      » 456k per day, 1.13M peak
ITHAKA/JSTOR Discovery Initiatives


  • Overhaul of JSTOR Search Infrastructure
     • Coming Soon (Summer 2012), watch for it…
  • Analytics and data warehouse
     • Ingesting, organizing, and analyzing billions of usage
       events since JSTOR inception
  • Improved external discoverability
     • Various SEO, Google/GS, MS-Academic projects
  • Local Discovery Integration (LDI) Pilot
  • Machine-based document classification
Local Discovery Integration Pilot
         JSTOR and Summon
Problem Statement:

 » Research has shown time and again that both students and faculty are
   beginning their research at places other than the library OPAC, most
   notably Google/Google Scholar and discipline-specific electronic
   databases, and that the trend is continuing
 Starting point for research, identified by faculty in 2003, 2006, and 2009 (2009 Faculty Study, ITHAKA)
        100%
         90%
                                                                                2003
         80%
                                                                                2006
         70%
                                                                                2009
         60%
         50%
         40%
         30%
         20%
         10%
          0%
                The library building online librarygeneral-purpose specific engine
                                 Your            A catalog       A search electronic research resource
Where is discovery happening?




  Where JSTOR ‘sessions’ originated | Jan 2011 – Dec 2011
Problem Statement:

 » As web-scale discovery services are being purchased and
   implemented by institutions, the value of those implementations
   are somewhat limited because they are (for the most part) only
   addressing that limited population of researchers who begin at
   a library-designated starting point (e.g. OPAC)

                    JSTOR usage | Australia | 2010 Nov.
                  JSTOR   Google/Google Scholar     Known Linking Partner   Library




                                              16%



                                                                                      6%
                                                    9%


                   76%
                                                                                      2%
Research Behavior: Students



            What is the easiest place to start research
                     according to students?

Library Databases

           Google

                        0         10          20          30     40   50   60   70
      Source: ProQuest survey of student research habits, 2007
Research Behavior: Faculty

    Starting Point for Research, identified by faculty in 2003, 2006, and 2009


       100%
        90%
                                                                                 2003
        80%
                                                                                 2006
        70%
                                                                                 2009
        60%
        50%
        40%
        30%
        20%
        10%
          0%
               The library building online librarygeneral-purpose specificengine
                                Your            A catalog       A search electronic research resource
       Source: ITHAKA 2009 Faculty Survey, 2010
Concept:

 » If we can more effectively reach the users at the place(s) where
   they normally begin their research, then we can begin to more
   effectively build their awareness of the resources that the
   institution has licensed/purchased for their purposes
 » The local discovery integration (LDI) pilot study will attempt to
   measure changes in the student/faculty research experience by
   „embedding‟ the institution‟s selected web-scale discovery
   service in strategically-selected places in the JSTOR interface
   where – we believe – the user would naturally want to „cast a
   wider net‟ for discovery
        2010 JSTOR Usage Highlights
        Total Significant Accesses             594,888,001
        Articles Downloaded                    74,901,344
        Articles Viewed                        112,751,906
        Searches Performed                     168,720,887
        Inbound Links from Licensed Partners   13,013,904
        Inbound Links from Google/Scholar      157,903,053
How it works


     Links Out

     • Search Results
             Advanced Search Page
             Search Results View
     •   3rd Page “Lightbox” pop-up
     •   Article View - Incoming from Google
     •   Article View - All other non-Google
     •   Zero Results Page




We placed links at various places along the research workflow in
JSTOR to allow students and researchers to “Cast a wider net”
Search results page

 » JSTOR may not be the most appropriate starting place in every
   instance, but it is a trusted and familiar interface. This will allow
   the user to „flowback‟ to another starting place (e.g. the library)


                                                          • Uses the familiar
                                                          university logo to grab
                                                          attention

                                                          • Inserts search terms into
                                                          link text to notify user of
                                                          customized behavior

                                                          • Positioned proximate to
                                                          search results; relevant
                                                          during the search result
                                                          evaluation phase
Empty results page

 » In this instance, the user has found nothing and the most typical web
   response is to hit the „Back‟ button. If we allow the user – at this point –
   to execute a search in the local discovery interface, we might improve
   the user experience
                                                         • One of the key places
                                                         where a user is likely to
                                                         want to try a different,
                                                         broader search

                                                         • Larger placement takes
                                                         advantage of available real
                                                         estate and cognitive space

                                                         • Users typically do not
                                                         spend time on this page so
                                                         it is important to increase
                                                         notice-ability and self-
                                                         explanation
Article page after Google search

 » In 2010, over 32M Google/Google Scholar searches brought users
   directly to an article page. They may or may not have found what they
   really wanted, so we‟d like to give them an alternative discovery choice

                                                       • Visible when coming
                                                       from a Google or Google
                                                       Scholar search

                                                       • Captures basic search
                                                       terms from the search

                                                       • Provides an opportunity
                                                       to convert a user from a
                                                       Google/Google Scholar
                                                       user to a Summon user
Article page after JSTOR search

 » In 2010, almost 113M articles were viewed in JSTOR. Again, they may
   not have found what they really wanted, so we‟d like to give them an
   alternative discovery choice

                                                     • Visible when coming
                                                     from a JSTOR search

                                                     • Raises visibility of the
                                                     feature by exposing it to a
                                                     large number of users

                                                     • Inserts search terms
                                                     into link text to notify user
                                                     of customized behavior
Results View: All Pages



Link out from the
bottom of all pages of
the search results view.
This will allow more
opportunities to link out
for students/ researchers
combing through large
sets of results.
Results View: 3rd Page




Pop-up on the third page of search results
Prompts the student/ researcher to indicate whether they wish to link out through the LDI. This
will enable us to measure whether students wish to “cast a wider net” or not. In the other link
scenarios we don’t have a baseline of how many students do not notice the link vs. choose not
to use it
Link out to Discovery Platform
Results Overview




             » Highest usage occurred in Zero Results scenario


Data shown is for all institutions participating in Summon LDI
Date range: July 2011 – February 2012
Machine-Based Article Classifier
    Assigning Articles to Disciplines
The Problem


 JSTOR Corpus
 • 60 disciplines
 • 1,600 journals
 • Nearly 8 million articles

 • Disciplines are associated at the Journal level
 • All articles in a Journal inherit the Journal assigned
   disciplines
 • Using this approach many articles have incomplete
   and/or incorrect discipline tagging hindering discovery

 • How to assign disciplines to articles?
Topic Models


  • Human classification and tagging is not feasible
     • A machine-based classification process is desired


  • Topic models are a way of finding structure in a set of
    documents
  • They allow is to find “latent” themes
  • A topic model is not a topic map
  • Some topic modeling approaches include
     • Latent Semantic Analysis (LSI/LSA) (Deerwester 1990)
     • Probabilistic LSA (Hoffmann 1999)
     • Latent Dirichlet Allocation (LDA) (Blei 2003)
Topic Modeling – our approach


  LDA – Latent Dirichlet Allocation
  • A generative probabilistic model for analyzing
    collections of documents
  • A Bayesian model where each document is modeled
    as a mixture of topics (disciplines)
  • Models semantic relationships between documents
    based on word co-occurrences
The Process


  • We select the most representative documents from
    each JSTOR discipline to build a topic model (from
    the vocabulary of the document sample)
     • This sampling and vocabulary modeling is the most important part
       of the process!
     • We’re still experimenting with this, but find the citation network
       provides a good means for identifying core documents in a
       discipline
     • Also considering whether usage data might be leveraged here
  • Each document in the corpus is then analyzed and
    compared to the topic model to determine how well it
    matches each topic
     • A probability distribution is generated providing discipline weights
     • The top weighted discipline(s) are associated with each article
Application


   • On-site discovery
      • Will be a key element of our overhauled search
        infrastructure, tentatively scheduled for beta release mid-summer
   • Use in article-level discipline/subject/topic mappings
     for better integration with aggregated indexes
      • Will support a richer data feed for Summon, for instance

More Related Content

Similar to NASIG 2012 - Discovering the World's Research (ITHAKA portion)

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationRachel Vacek
 
Webscale Discovery and Information Literacy
Webscale Discovery and Information LiteracyWebscale Discovery and Information Literacy
Webscale Discovery and Information LiteracyCharleston Conference
 
Webscale discovery and information literacy
Webscale discovery and information literacyWebscale discovery and information literacy
Webscale discovery and information literacyli1smc
 
Corrall & Dove - Web scale discovery and information literacy: competing visi...
Corrall & Dove - Web scale discovery and information literacy: competing visi...Corrall & Dove - Web scale discovery and information literacy: competing visi...
Corrall & Dove - Web scale discovery and information literacy: competing visi...IL Group (CILIP Information Literacy Group)
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 PosterLuciana Jaalouk
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationNikesh Narayanan
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldiBruce Heterick
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldiBruce Heterick
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
Researching Researchers - Designing the User Experience at ProQuest
Researching Researchers - Designing the User Experience at ProQuestResearching Researchers - Designing the User Experience at ProQuest
Researching Researchers - Designing the User Experience at ProQuestMathias Burton
 
Ux Research Portfolio
Ux Research PortfolioUx Research Portfolio
Ux Research PortfolioTao Zhang
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolElectronic Resources & Libraries
 
The Utrecht Approach - JIBS User group July 22 2014
The Utrecht Approach - JIBS User group July 22 2014The Utrecht Approach - JIBS User group July 22 2014
The Utrecht Approach - JIBS User group July 22 2014Bianca Kramer
 
Academic User Experience: Students, Faculty, and Libraries
Academic User Experience: Students, Faculty, and LibrariesAcademic User Experience: Students, Faculty, and Libraries
Academic User Experience: Students, Faculty, and LibrariesDerek Poppink CXA CUA
 
LIBRARY ASSESSMENT
LIBRARY ASSESSMENTLIBRARY ASSESSMENT
LIBRARY ASSESSMENTJen Rutner
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 

Similar to NASIG 2012 - Discovering the World's Research (ITHAKA portion) (20)

Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
Webscale Discovery and Information Literacy
Webscale Discovery and Information LiteracyWebscale Discovery and Information Literacy
Webscale Discovery and Information Literacy
 
Webscale discovery and information literacy
Webscale discovery and information literacyWebscale discovery and information literacy
Webscale discovery and information literacy
 
Corrall & Dove - Web scale discovery and information literacy: competing visi...
Corrall & Dove - Web scale discovery and information literacy: competing visi...Corrall & Dove - Web scale discovery and information literacy: competing visi...
Corrall & Dove - Web scale discovery and information literacy: competing visi...
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communication
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldi
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldi
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Researching Researchers - Designing the User Experience at ProQuest
Researching Researchers - Designing the User Experience at ProQuestResearching Researchers - Designing the User Experience at ProQuest
Researching Researchers - Designing the User Experience at ProQuest
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
Ux Research Portfolio
Ux Research PortfolioUx Research Portfolio
Ux Research Portfolio
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
The Utrecht Approach - JIBS User group July 22 2014
The Utrecht Approach - JIBS User group July 22 2014The Utrecht Approach - JIBS User group July 22 2014
The Utrecht Approach - JIBS User group July 22 2014
 
Academic User Experience: Students, Faculty, and Libraries
Academic User Experience: Students, Faculty, and LibrariesAcademic User Experience: Students, Faculty, and Libraries
Academic User Experience: Students, Faculty, and Libraries
 
LIBRARY ASSESSMENT
LIBRARY ASSESSMENTLIBRARY ASSESSMENT
LIBRARY ASSESSMENT
 
Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015
 
Cil06giltrud(1)
Cil06giltrud(1)Cil06giltrud(1)
Cil06giltrud(1)
 
October 28, 2015 NISO Virtual Conference Interacting with Content: Improving ...
October 28, 2015 NISO Virtual Conference Interacting with Content: Improving ...October 28, 2015 NISO Virtual Conference Interacting with Content: Improving ...
October 28, 2015 NISO Virtual Conference Interacting with Content: Improving ...
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

NASIG 2012 - Discovering the World's Research (ITHAKA portion)

  • 1. Discovering the World’s Research Ron Snyder Director of Advanced Technology, ITHAKA/JSTOR NASIG Annual Conference - 2012 June 9, 2012
  • 2. Who we are ITHAKA is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways. We pursue this mission by providing innovative services that aid in the adoption of these technologies and that create lasting impact. JSTOR is a research platform that enables discovery, access, and preservation of scholarly content.
  • 3. JSTOR Factoids • Started in 1997 • Journals online: 1,604 • Articles online: 7.5 million • Disciplines covered: 60 • Participating institutions: 7,800 • Countries with participating institutions: 167
  • 4. JSTOR site activity User Sessions (visits) » New Sessions (per hour):  70k peak, 38k average » Simultaneous Sessions:  44k peak, 21k average Page Views » 3.5M per day, 6.7M peak Content Accesses » 430k per day, 850K peak Searches » 456k per day, 1.13M peak
  • 5. ITHAKA/JSTOR Discovery Initiatives • Overhaul of JSTOR Search Infrastructure • Coming Soon (Summer 2012), watch for it… • Analytics and data warehouse • Ingesting, organizing, and analyzing billions of usage events since JSTOR inception • Improved external discoverability • Various SEO, Google/GS, MS-Academic projects • Local Discovery Integration (LDI) Pilot • Machine-based document classification
  • 6. Local Discovery Integration Pilot JSTOR and Summon
  • 7. Problem Statement: » Research has shown time and again that both students and faculty are beginning their research at places other than the library OPAC, most notably Google/Google Scholar and discipline-specific electronic databases, and that the trend is continuing Starting point for research, identified by faculty in 2003, 2006, and 2009 (2009 Faculty Study, ITHAKA) 100% 90% 2003 80% 2006 70% 2009 60% 50% 40% 30% 20% 10% 0% The library building online librarygeneral-purpose specific engine Your A catalog A search electronic research resource
  • 8. Where is discovery happening? Where JSTOR ‘sessions’ originated | Jan 2011 – Dec 2011
  • 9. Problem Statement: » As web-scale discovery services are being purchased and implemented by institutions, the value of those implementations are somewhat limited because they are (for the most part) only addressing that limited population of researchers who begin at a library-designated starting point (e.g. OPAC) JSTOR usage | Australia | 2010 Nov. JSTOR Google/Google Scholar Known Linking Partner Library 16% 6% 9% 76% 2%
  • 10. Research Behavior: Students What is the easiest place to start research according to students? Library Databases Google 0 10 20 30 40 50 60 70 Source: ProQuest survey of student research habits, 2007
  • 11. Research Behavior: Faculty Starting Point for Research, identified by faculty in 2003, 2006, and 2009 100% 90% 2003 80% 2006 70% 2009 60% 50% 40% 30% 20% 10% 0% The library building online librarygeneral-purpose specificengine Your A catalog A search electronic research resource Source: ITHAKA 2009 Faculty Survey, 2010
  • 12. Concept: » If we can more effectively reach the users at the place(s) where they normally begin their research, then we can begin to more effectively build their awareness of the resources that the institution has licensed/purchased for their purposes » The local discovery integration (LDI) pilot study will attempt to measure changes in the student/faculty research experience by „embedding‟ the institution‟s selected web-scale discovery service in strategically-selected places in the JSTOR interface where – we believe – the user would naturally want to „cast a wider net‟ for discovery 2010 JSTOR Usage Highlights Total Significant Accesses 594,888,001 Articles Downloaded 74,901,344 Articles Viewed 112,751,906 Searches Performed 168,720,887 Inbound Links from Licensed Partners 13,013,904 Inbound Links from Google/Scholar 157,903,053
  • 13. How it works Links Out • Search Results  Advanced Search Page  Search Results View • 3rd Page “Lightbox” pop-up • Article View - Incoming from Google • Article View - All other non-Google • Zero Results Page We placed links at various places along the research workflow in JSTOR to allow students and researchers to “Cast a wider net”
  • 14. Search results page » JSTOR may not be the most appropriate starting place in every instance, but it is a trusted and familiar interface. This will allow the user to „flowback‟ to another starting place (e.g. the library) • Uses the familiar university logo to grab attention • Inserts search terms into link text to notify user of customized behavior • Positioned proximate to search results; relevant during the search result evaluation phase
  • 15. Empty results page » In this instance, the user has found nothing and the most typical web response is to hit the „Back‟ button. If we allow the user – at this point – to execute a search in the local discovery interface, we might improve the user experience • One of the key places where a user is likely to want to try a different, broader search • Larger placement takes advantage of available real estate and cognitive space • Users typically do not spend time on this page so it is important to increase notice-ability and self- explanation
  • 16. Article page after Google search » In 2010, over 32M Google/Google Scholar searches brought users directly to an article page. They may or may not have found what they really wanted, so we‟d like to give them an alternative discovery choice • Visible when coming from a Google or Google Scholar search • Captures basic search terms from the search • Provides an opportunity to convert a user from a Google/Google Scholar user to a Summon user
  • 17. Article page after JSTOR search » In 2010, almost 113M articles were viewed in JSTOR. Again, they may not have found what they really wanted, so we‟d like to give them an alternative discovery choice • Visible when coming from a JSTOR search • Raises visibility of the feature by exposing it to a large number of users • Inserts search terms into link text to notify user of customized behavior
  • 18. Results View: All Pages Link out from the bottom of all pages of the search results view. This will allow more opportunities to link out for students/ researchers combing through large sets of results.
  • 19. Results View: 3rd Page Pop-up on the third page of search results Prompts the student/ researcher to indicate whether they wish to link out through the LDI. This will enable us to measure whether students wish to “cast a wider net” or not. In the other link scenarios we don’t have a baseline of how many students do not notice the link vs. choose not to use it
  • 20. Link out to Discovery Platform
  • 21. Results Overview » Highest usage occurred in Zero Results scenario Data shown is for all institutions participating in Summon LDI Date range: July 2011 – February 2012
  • 22. Machine-Based Article Classifier Assigning Articles to Disciplines
  • 23. The Problem JSTOR Corpus • 60 disciplines • 1,600 journals • Nearly 8 million articles • Disciplines are associated at the Journal level • All articles in a Journal inherit the Journal assigned disciplines • Using this approach many articles have incomplete and/or incorrect discipline tagging hindering discovery • How to assign disciplines to articles?
  • 24. Topic Models • Human classification and tagging is not feasible • A machine-based classification process is desired • Topic models are a way of finding structure in a set of documents • They allow is to find “latent” themes • A topic model is not a topic map • Some topic modeling approaches include • Latent Semantic Analysis (LSI/LSA) (Deerwester 1990) • Probabilistic LSA (Hoffmann 1999) • Latent Dirichlet Allocation (LDA) (Blei 2003)
  • 25. Topic Modeling – our approach LDA – Latent Dirichlet Allocation • A generative probabilistic model for analyzing collections of documents • A Bayesian model where each document is modeled as a mixture of topics (disciplines) • Models semantic relationships between documents based on word co-occurrences
  • 26. The Process • We select the most representative documents from each JSTOR discipline to build a topic model (from the vocabulary of the document sample) • This sampling and vocabulary modeling is the most important part of the process! • We’re still experimenting with this, but find the citation network provides a good means for identifying core documents in a discipline • Also considering whether usage data might be leveraged here • Each document in the corpus is then analyzed and compared to the topic model to determine how well it matches each topic • A probability distribution is generated providing discipline weights • The top weighted discipline(s) are associated with each article
  • 27. Application • On-site discovery • Will be a key element of our overhauled search infrastructure, tentatively scheduled for beta release mid-summer • Use in article-level discipline/subject/topic mappings for better integration with aggregated indexes • Will support a richer data feed for Summon, for instance

Editor's Notes

  1. So, how do we take a good idea (web-scale discovery) and make it better?How do we take the basic principle – which is good and valuable – and use it in such a way so that it achieves a broader impact?
  2. Replace w data for all JSTOR (requested)
  3. For several years, the anecdotal evidence that librarians had been witnessing first-hand was beginning to be verified by user studies published by OCLC and others, as well as student surveys reported on by ProQuest and others.The evidence was overwhelming: the gateway function that libraries had played for so long – and valued so much – being THE gateway to academic research – was quickly being overtaken by web-based search engines like Google.It was one thing when undergraduates started to migrate away from the library …
  4. A number of organizations had been following this trend closely - including my own (ITHAKA … which is the organizational umbrella under which JSTOR and Portico reside). We were taking a longitudinal look at faculty views about the library – and other pertinent scholarly communications issues – and comparing those view with similar survey data from librarians.One noticeable disconnect in these surveys – as you might imagine – as the perception of the “library as gateway”. Librarians believe it to be hugely important and faculty less so (science faculty much less so than humanities faculty). And students? Even less than that.Yet, the dollars being spent on access services in libraries – both software and people – were (and continue to be) tremendous. Are those expenditures aligned properly with the expectations of the users, and if they are, then how do we more effectively leverage those investments to reach a broader audience?
  5. Search Result Page: Design Notes-- Link is proximate to the first search result so that it is part of the evaluation workflow (e.g. user looks at first result, decides it is no good, sees the link)-- Uses branding element that the user is familiar with … should be something that all students / users will recognize-- Customized text in the link indicates to the user that this is session-specific and workflow-specific behavior-- Uses a link which is unobtrusive and not confusing, yet noticeable (vis-à-vis a search box, which we already have too many of) Requirements Used-- Uses 16x45px max logo
  6. No results page: design notes-- Mimics the JSTOR search box above, clearly indicating behavior to the user-- Button size is near the maximum size it could be and still look like a clickable button-- Key place where users are likely to “cast a wider net” (exhausted all JSTOR search results)-- Takes advantage of larger real estate and simpler page design to drive users toward this feature Requirements Used-- Uses canonical name for button text (25 characters max)-- Uses larger logo, 250x50px max
  7. Article Page – JSTOR search: Design notes-- Customized text in the link indicates to the user that this is session-specific and workflow-specific behavior-- Article page is already very stuffed, esp. with CSP/Publisher stuff, so we were forced to go with something more minimalist here-- Not necessarily a core workflow for users, searching from the article page, but gives us the opportunity to expose the feature to a wide audience-- Only difference from JSTOR article page is the missing “Back To Search Results” linkRequirements Used-- Uses canonical name for link text (25 characters max)
  8. Article Page – JSTOR search: Design notes-- Customized text in the link indicates to the user that this is session-specific and workflow-specific behavior-- Article page is already very stuffed, esp. with CSP/Publisher stuff, so we were forced to go with something more minimalist here-- Not necessarily a core workflow for users, searching from the article page, but gives us the opportunity to expose the feature to a wide audienceRequirements Used-- Uses canonical name for link text (25 characters max)
  9. Article Page – JSTOR search: Design notes-- Customized text in the link indicates to the user that this is session-specific and workflow-specific behavior-- Article page is already very stuffed, esp. with CSP/Publisher stuff, so we were forced to go with something more minimalist here-- Not necessarily a core workflow for users, searching from the article page, but gives us the opportunity to expose the feature to a wide audienceRequirements Used-- Uses canonical name for link text (25 characters max)