SlideShare a Scribd company logo
1 of 26
Download to read offline
JSTOR
    Advanced Technology Research



    Denver
    25th January 2008
    John Burns
    Clare Llewellyn




l
Today we will introduce a public beta of our Data for
      Research service and show you some of the other
      services that JSTOR’s advanced technology group
      is working on.

    Mission: Working with other researchers on large-
      scale text and data mining initiatives with an eye
      toward beneficial applications for scholars and
      students.




l
What is Data Mining?

“Data mining is the process of extracting hidden patterns from data”
                                                  Lyman and Varian 2003

“As data sets and the information extracted from them have grown in
   size and complexity, direct hands-on data analysis has increasingly
   been supplemented and augmented with indirect, automatic data
   processing using more complex and sophisticated tools, methods and
   models”
                                                        Kantardizic 2002

Example:
  Data mining is using consumer purchasing patterns to predict which
  products are bought together (gas and flights)




l
What is Text Mining?


“In text mining the patterns are extracted from natural language text
    rather than from structured databases of facts”
                                                      Marti Hearst 2003

“Text mining attempts to discover new, previously unknown
   information by applying techniques from information retrieval,
   natural language processing and data mining”
                                      National Text Mining Center, UK

Example:
  Looking at which words co-occur in articles that in order to predict
  interactions (magnesium and migraines)




l
Advanced Technology at JSTOR




    •  Why are we here
    •  Who we are
    •  What we are doing




l
Why are we releasing our system here?

Librarians are the point from which innovation is spread throughout the
academy

“New roles and functions for librarians include:
    •  information consultants and producers
    •  information gatekeepers and intermediators
    •  end-user educators
    •  managers and leaders
    •  data analysts in data administration centers
    •  preservers of knowledge
    •  information equalizers”
                                                               Park 1987

A Data Support Role: “Helping students get their hands dirty with the
data”
                                                        Robin Rice 2008
                     2nd DCC / RIN Research Data Management Forum


l
Who we are - Advanced Technology Research

•  A formal commitment by JSTOR to a pro-active role in technology
   innovation to face new challenges and opportunities
•  Our MO is to collaborate with and aid the scholarly community
•  We area team of world-class scientists and technologists with a proven
   track record of innovation

Mission Statement

    “The Advanced Technology Research Group is dedicated to creating,
    discovering and using relevant technologies in support of JSTOR and the
    broader scholarly community.”




l
ATR - Collaborations with the academic community.

For other researchers we provide
•  Access to large well-curated data sets
•  An exposure channel on JSTOR for research results
•  Facilities on JSTOR to expose tools and techniques to users
•  Collaboration opportunities

For JSTOR
•  We evaluate novel techniques
•  We present rapid prototypes to users
•  Develop peer relationships with research institutions
•  Bring new forms of traffic to the JSTOR data
•  Reuse JSTOR data in new and exciting ways




l
What we are doing - Projects and Partners


    •  University of Washington – Citation Network Analysis
    •  University of Princeton – Topic Analysis
    •  UIUC - Software Environment for the Advancement of Scholarly
       Research (SEASR)
    •  University of Michigan – Linguistic tools
    •  Tufts -Classics Studies
    •  University of Liverpool – OAI-ORE, Text Mining, Data Analysis
    •  University of Queensland - Annotations
    •  Los Alamos National Labs – Annotation Management
    •  DFKI (German Artificial Intelligence Centre) – Document capture
       and reconstruction / remastering.
    •  XRCE (EuroPARC, France) – Scanned Document Analysis
    •  …


l
Advanced Technology Research - Showcase



    Showcase provides a preview of interesting and useful
      technologies. It allows our research partners to demonstrate
      their tools and gain feedback and it allows JSTOR to assess
      candidate technologies before committing them to the product
      roadmap.




l
Advanced Technology Research - Showcase


    A place to expose JSTOR data and tools and to encourage new
       research

       •  Provides access to JSTOR datasets
       •  Facility to expose and use tools created by researchers from
          JSTOR and elsewhere.
       •  Explanation of ongoing research
       •  As a forum to facilitate connections between groups working with
          JSTOR data

       URL: http://showcase.jstor.org




l
Data for Research



    •  DFR is a set of web tools designed to allow for the visual
       exploration of large-scale data sets and the download of word
       frequencies in JSTOR articles

    •  Beta Version launched 01/23/09

    •  URL: http://dfr.jstor.org




l
Why Word Frequencies

    Data Requested from JSTOR users in 2008




                                              OCR Data

                                              Citation Data

                                              Usage Data

                                              Word Frequency




l
What can you do with work counts?


    Real life requests:

    “I would like to request time and word distribution frequencies in
        linguistics (specific movement removed). These sorts of
        frequencies could potentially allow me to better understand and
        delimit the formation of groups, and the underlying impetus
        behind these groups as expressed in linguistic form.”

    “I would like to create subject headings for material, using word
        frequency as a guide to selecting the appropriate terms for the
        headings.”




l
DFR – DEMO!



                  http://dfr.jstor.org




l
DFR – Front Page




l
Thefe




l
Hath Pre - 1900




l
Hath – post 1900




l
Chymistry




l
Download Page




l
Files Downloaded




l
l
                           4




           0
               1
                   2
                       3
                               5
                                   6
                                       7
    1666                                   8
    1669
    1672
    1675
    1683
    1692
    1697
    1703
    1712
    1738
    1765
    1783
    1801
    1889
    1907
    1916
    1921
    1928
    1931
    1936
    1941
    1945
    1950
    1953
    1956
    1960
    1964
    1967
    1971
    1974
    1980
    1983
                                               Chart to show the use of the word Chymistry




    1987
    1990
    1993
    1996
    1999
    2002
    2005
l
3 Journals from 1957




The Annals Mathematics   American Journal Nursing   Agricultural History




l
Any questions / feedback?

      Please take a look at the site and tell us what you think.
                       Email: dfr@jstor.org

    Contact details
    Email: clare.llewellyn@jstor.org
    Phone: 609-986-2282




l

More Related Content

What's hot

Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
Getaneh Alemu
 
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
CALA-MW
 

What's hot (20)

Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library Metadata enriching and discovery at Solent University Library
Metadata enriching and discovery at Solent University Library
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Gonzalez-8-jun15
Gonzalez-8-jun15Gonzalez-8-jun15
Gonzalez-8-jun15
 
Metadata enriching and discovery
Metadata enriching and discovery Metadata enriching and discovery
Metadata enriching and discovery
 
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
Cleveland & Western Reserve Digital Text Collection Project - Suzhen Chen & R...
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
A theory of Metadata enriching & filtering
A theory of  Metadata enriching & filteringA theory of  Metadata enriching & filtering
A theory of Metadata enriching & filtering
 
Europeana and open data
Europeana and open dataEuropeana and open data
Europeana and open data
 
ESIP Commons Presentation
ESIP Commons PresentationESIP Commons Presentation
ESIP Commons Presentation
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked Library
 
Scratchpad training
Scratchpad trainingScratchpad training
Scratchpad training
 
Open Science and Identifiers
Open Science and IdentifiersOpen Science and Identifiers
Open Science and Identifiers
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data Arrives
 
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
 
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
 
UW Libraries Data Services Forum
UW Libraries Data Services ForumUW Libraries Data Services Forum
UW Libraries Data Services Forum
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 

Viewers also liked

The Earths Crust.Ppt 1.Ppt 4
The Earths Crust.Ppt 1.Ppt 4The Earths Crust.Ppt 1.Ppt 4
The Earths Crust.Ppt 1.Ppt 4
derekfun
 
Personnel records, audit and research - HR Audit
Personnel records, audit and research - HR AuditPersonnel records, audit and research - HR Audit
Personnel records, audit and research - HR Audit
Tanuj Poddar
 

Viewers also liked (7)

EngWri 300 (Magneson)
EngWri 300 (Magneson)EngWri 300 (Magneson)
EngWri 300 (Magneson)
 
JSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary ResearchersJSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary Researchers
 
The Earths Crust.Ppt 1.Ppt 4
The Earths Crust.Ppt 1.Ppt 4The Earths Crust.Ppt 1.Ppt 4
The Earths Crust.Ppt 1.Ppt 4
 
Discovery and analysis of the world's research collections: JSTOR and Summon ...
Discovery and analysis of the world's research collections: JSTOR and Summon ...Discovery and analysis of the world's research collections: JSTOR and Summon ...
Discovery and analysis of the world's research collections: JSTOR and Summon ...
 
HR Records & Reports
HR Records & ReportsHR Records & Reports
HR Records & Reports
 
Personnel records, audit and research - HR Audit
Personnel records, audit and research - HR AuditPersonnel records, audit and research - HR Audit
Personnel records, audit and research - HR Audit
 
Hr audit
Hr auditHr audit
Hr audit
 

Similar to Data for Research (DfR) service

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
lljohnston
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
lljohnston
 

Similar to Data for Research (DfR) service (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
RDA - The Research Data Alliance in a Nutshell
RDA - The Research Data Alliance in a NutshellRDA - The Research Data Alliance in a Nutshell
RDA - The Research Data Alliance in a Nutshell
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Cosi Usage Data
Cosi   Usage DataCosi   Usage Data
Cosi Usage Data
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library Associations
 
Ir1
Ir1Ir1
Ir1
 
Open Science
Open Science Open Science
Open Science
 
EOSC and libraries
EOSC and librariesEOSC and libraries
EOSC and libraries
 
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 

More from historiaimedia

The Social Use of Digital History (presentation)
The Social Use of Digital History (presentation)The Social Use of Digital History (presentation)
The Social Use of Digital History (presentation)
historiaimedia
 
historia i dziedzictwo w kulturze uczestnictwa
historia i dziedzictwo w kulturze uczestnictwahistoria i dziedzictwo w kulturze uczestnictwa
historia i dziedzictwo w kulturze uczestnictwa
historiaimedia
 
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
historiaimedia
 
Strategie wykorzystania Internetu w nauce historycznej
Strategie wykorzystania Internetu w nauce historycznejStrategie wykorzystania Internetu w nauce historycznej
Strategie wykorzystania Internetu w nauce historycznej
historiaimedia
 

More from historiaimedia (9)

Perspektywy oddolnej digitalizacji
Perspektywy oddolnej digitalizacjiPerspektywy oddolnej digitalizacji
Perspektywy oddolnej digitalizacji
 
The Social Use of Digital History (presentation)
The Social Use of Digital History (presentation)The Social Use of Digital History (presentation)
The Social Use of Digital History (presentation)
 
428348032942
428348032942428348032942
428348032942
 
Crowdsourcing 2010 05_05
Crowdsourcing 2010 05_05Crowdsourcing 2010 05_05
Crowdsourcing 2010 05_05
 
historia i dziedzictwo w kulturze uczestnictwa
historia i dziedzictwo w kulturze uczestnictwahistoria i dziedzictwo w kulturze uczestnictwa
historia i dziedzictwo w kulturze uczestnictwa
 
Prezentacja Archiwa KARTY w Internecie
Prezentacja Archiwa KARTY w InterneciePrezentacja Archiwa KARTY w Internecie
Prezentacja Archiwa KARTY w Internecie
 
Schemat prezentacji
Schemat prezentacjiSchemat prezentacji
Schemat prezentacji
 
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
“Methodology for the Infinite Archive”: Exploring the Implications of Digital...
 
Strategie wykorzystania Internetu w nauce historycznej
Strategie wykorzystania Internetu w nauce historycznejStrategie wykorzystania Internetu w nauce historycznej
Strategie wykorzystania Internetu w nauce historycznej
 

Recently uploaded

MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
Krashi Coaching
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 

Recently uploaded (20)

8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 

Data for Research (DfR) service

  • 1. JSTOR Advanced Technology Research Denver 25th January 2008 John Burns Clare Llewellyn l
  • 2. Today we will introduce a public beta of our Data for Research service and show you some of the other services that JSTOR’s advanced technology group is working on. Mission: Working with other researchers on large- scale text and data mining initiatives with an eye toward beneficial applications for scholars and students. l
  • 3. What is Data Mining? “Data mining is the process of extracting hidden patterns from data” Lyman and Varian 2003 “As data sets and the information extracted from them have grown in size and complexity, direct hands-on data analysis has increasingly been supplemented and augmented with indirect, automatic data processing using more complex and sophisticated tools, methods and models” Kantardizic 2002 Example: Data mining is using consumer purchasing patterns to predict which products are bought together (gas and flights) l
  • 4. What is Text Mining? “In text mining the patterns are extracted from natural language text rather than from structured databases of facts” Marti Hearst 2003 “Text mining attempts to discover new, previously unknown information by applying techniques from information retrieval, natural language processing and data mining” National Text Mining Center, UK Example: Looking at which words co-occur in articles that in order to predict interactions (magnesium and migraines) l
  • 5. Advanced Technology at JSTOR •  Why are we here •  Who we are •  What we are doing l
  • 6. Why are we releasing our system here? Librarians are the point from which innovation is spread throughout the academy “New roles and functions for librarians include: •  information consultants and producers •  information gatekeepers and intermediators •  end-user educators •  managers and leaders •  data analysts in data administration centers •  preservers of knowledge •  information equalizers” Park 1987 A Data Support Role: “Helping students get their hands dirty with the data” Robin Rice 2008 2nd DCC / RIN Research Data Management Forum l
  • 7. Who we are - Advanced Technology Research •  A formal commitment by JSTOR to a pro-active role in technology innovation to face new challenges and opportunities •  Our MO is to collaborate with and aid the scholarly community •  We area team of world-class scientists and technologists with a proven track record of innovation Mission Statement “The Advanced Technology Research Group is dedicated to creating, discovering and using relevant technologies in support of JSTOR and the broader scholarly community.” l
  • 8. ATR - Collaborations with the academic community. For other researchers we provide •  Access to large well-curated data sets •  An exposure channel on JSTOR for research results •  Facilities on JSTOR to expose tools and techniques to users •  Collaboration opportunities For JSTOR •  We evaluate novel techniques •  We present rapid prototypes to users •  Develop peer relationships with research institutions •  Bring new forms of traffic to the JSTOR data •  Reuse JSTOR data in new and exciting ways l
  • 9. What we are doing - Projects and Partners •  University of Washington – Citation Network Analysis •  University of Princeton – Topic Analysis •  UIUC - Software Environment for the Advancement of Scholarly Research (SEASR) •  University of Michigan – Linguistic tools •  Tufts -Classics Studies •  University of Liverpool – OAI-ORE, Text Mining, Data Analysis •  University of Queensland - Annotations •  Los Alamos National Labs – Annotation Management •  DFKI (German Artificial Intelligence Centre) – Document capture and reconstruction / remastering. •  XRCE (EuroPARC, France) – Scanned Document Analysis •  … l
  • 10. Advanced Technology Research - Showcase Showcase provides a preview of interesting and useful technologies. It allows our research partners to demonstrate their tools and gain feedback and it allows JSTOR to assess candidate technologies before committing them to the product roadmap. l
  • 11. Advanced Technology Research - Showcase A place to expose JSTOR data and tools and to encourage new research •  Provides access to JSTOR datasets •  Facility to expose and use tools created by researchers from JSTOR and elsewhere. •  Explanation of ongoing research •  As a forum to facilitate connections between groups working with JSTOR data URL: http://showcase.jstor.org l
  • 12. Data for Research •  DFR is a set of web tools designed to allow for the visual exploration of large-scale data sets and the download of word frequencies in JSTOR articles •  Beta Version launched 01/23/09 •  URL: http://dfr.jstor.org l
  • 13. Why Word Frequencies Data Requested from JSTOR users in 2008 OCR Data Citation Data Usage Data Word Frequency l
  • 14. What can you do with work counts? Real life requests: “I would like to request time and word distribution frequencies in linguistics (specific movement removed). These sorts of frequencies could potentially allow me to better understand and delimit the formation of groups, and the underlying impetus behind these groups as expressed in linguistic form.” “I would like to create subject headings for material, using word frequency as a guide to selecting the appropriate terms for the headings.” l
  • 15. DFR – DEMO! http://dfr.jstor.org l
  • 16. DFR – Front Page l
  • 18. Hath Pre - 1900 l
  • 19. Hath – post 1900 l
  • 23. l 4 0 1 2 3 5 6 7 1666 8 1669 1672 1675 1683 1692 1697 1703 1712 1738 1765 1783 1801 1889 1907 1916 1921 1928 1931 1936 1941 1945 1950 1953 1956 1960 1964 1967 1971 1974 1980 1983 Chart to show the use of the word Chymistry 1987 1990 1993 1996 1999 2002 2005
  • 24. l
  • 25. 3 Journals from 1957 The Annals Mathematics American Journal Nursing Agricultural History l
  • 26. Any questions / feedback? Please take a look at the site and tell us what you think. Email: dfr@jstor.org Contact details Email: clare.llewellyn@jstor.org Phone: 609-986-2282 l