SlideShare uma empresa Scribd logo
1 de 16
Content Provision
Lucas Anastasiou
The Open University, Knowledge Media Institute
18-06-2015
Rhodes, Greece, 2015
A simple text mining exercise
Average length of
dissertations (doctorates
and master thesis) by
major
3037 records from
University of Minnesota
https://beckmw.wordpress.com/2014/07/15/average-dissertation-and-thesis-length-take-two/
• R script to scrape pdfs from Institutional
Repository
• Extract Text
• Parse Text
• Plot the data
A simple text mining exercise
The challenge of TDM
• 90% of TDM project [1]
is spent
– Collecting data
– Harmonising
– Pre-processing data
• Magnitude of data
• Heterogeneity of data
[1] Jisc Open Mirror Report Oct 2013
The case of scientific literature
• Identifying levels of access
– Transactional information access
– Analytical information access
– Raw data (programatical) access
– Google scholar estimated at 100 million papers
3 levels of access
How the “big” guys are doing?
Access type Google scholar MS Academic Research
Transactional Browser interface (portal) Browser interface (portal)
Analytical access Citation analysis, researcher
profile
Visualisations, citation
analysis, authors
connections
Raw data access No API, scrapping possible
(violation of ToC)
Limited API, explicitly
forbidden to download full
corpus
Other scholarly data sources
Name of service Transactional Analytical Raw
CiteseerX  
PubMed 
Arxiv  
Scopus 
Web of Science 
SpringerLink  ☐ 
Elsevier  
The case of Elsevier API
• Sufficient for some tasks but not for all, no
access to the full corpus
• Restricting the usability of API, controlling the
access
• Potential loss of information (what you see in
portal is different than API)
• (may) require special dispensation from
authors
“It takes a lot of time and a lot of energy and
doesn’t scale at all”
Heather Piwowar
Is this enough for the TDM
community?
• If you are a TDM-er you need to have true
unrestricted programmatic level of access
• APIs
– Offer programatic access to individual articles
– Offered in XML/JSON
– Lack of standard schema, providers use
proprietary formats
– (may be) sufficient for some TDM tasks (e.g.
information extraction)
APIs not enough
• Other family of TDM tasks require access to
the full corpus
– E.g. recommender systems, text summarisation
• Need to have access to complete collection of
articles
• Data dumps
Data dumps not enough
• Even if you can access whole corpus you would
need special hardware resources
• Arxiv.org data dump compressed: 300Gb
– How do you get it?
– Where to store it? (*)
– How to analyse it?
– How to disseminate your findings?
– In what format?
– How can someone else verify your findings?
Research should be reproducible !
Non-technological barriers
• Legal uncertainty
• Copyright, database rights, licensing
• (some) publishers require special dispensation
• Skills gap
• Researchers lack of awareness of TDM
potential
Summary
• Collecting data is a tedious and time-
consuming task, perhaps impossible
• Scientific literature lacks programmatic level
of access
• APIs and Data dumps (though nice) not
enough
• Other barriers
openMinTed vision
• Data and algorithms in one place
• Interoperable framework
• “Safe” environment (legal status)
• Trusted environment
Thank you!
Q n A
TDM-ing today
…

Mais conteúdo relacionado

Mais procurados

PhD Projects in Wordnet Research Assistance
PhD Projects in Wordnet Research Assistance PhD Projects in Wordnet Research Assistance
PhD Projects in Wordnet Research Assistance PhD Services
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
 
PhD Projects in Ad Hoc Network With Source Code
PhD Projects in Ad Hoc Network With Source CodePhD Projects in Ad Hoc Network With Source Code
PhD Projects in Ad Hoc Network With Source CodePhD Services
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
Writing Help Your PhD Research System Development
Writing Help Your PhD Research System DevelopmentWriting Help Your PhD Research System Development
Writing Help Your PhD Research System DevelopmentPhD Services
 
PhD Projects in Text Mining Research Topics With Source Code
PhD Projects in Text Mining Research Topics With Source CodePhD Projects in Text Mining Research Topics With Source Code
PhD Projects in Text Mining Research Topics With Source CodePhD Services
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryHeinz Pampel
 
PhD Projects in Audio Speech Language Processing Tutorial
PhD Projects in Audio Speech Language Processing TutorialPhD Projects in Audio Speech Language Processing Tutorial
PhD Projects in Audio Speech Language Processing TutorialPhD Services
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and publicStephanie Simms
 
Writing Help Your PhD Research Proposal
Writing Help Your PhD Research ProposalWriting Help Your PhD Research Proposal
Writing Help Your PhD Research ProposalPhD Services
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 

Mais procurados (20)

PhD Projects in Wordnet Research Assistance
PhD Projects in Wordnet Research Assistance PhD Projects in Wordnet Research Assistance
PhD Projects in Wordnet Research Assistance
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Elab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-finalElab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-final
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
PhD Projects in Ad Hoc Network With Source Code
PhD Projects in Ad Hoc Network With Source CodePhD Projects in Ad Hoc Network With Source Code
PhD Projects in Ad Hoc Network With Source Code
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
EDS for IFLA
EDS for IFLAEDS for IFLA
EDS for IFLA
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Writing Help Your PhD Research System Development
Writing Help Your PhD Research System DevelopmentWriting Help Your PhD Research System Development
Writing Help Your PhD Research System Development
 
PhD Projects in Text Mining Research Topics With Source Code
PhD Projects in Text Mining Research Topics With Source CodePhD Projects in Text Mining Research Topics With Source Code
PhD Projects in Text Mining Research Topics With Source Code
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
PhD Projects in Audio Speech Language Processing Tutorial
PhD Projects in Audio Speech Language Processing TutorialPhD Projects in Audio Speech Language Processing Tutorial
PhD Projects in Audio Speech Language Processing Tutorial
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
Writing Help Your PhD Research Proposal
Writing Help Your PhD Research ProposalWriting Help Your PhD Research Proposal
Writing Help Your PhD Research Proposal
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 

Destaque

Destaque (20)

Emerging Tech in Libraries
Emerging Tech in LibrariesEmerging Tech in Libraries
Emerging Tech in Libraries
 
Presentazione Profilo salute 2008 Valdichiana Aretina
Presentazione Profilo salute 2008 Valdichiana AretinaPresentazione Profilo salute 2008 Valdichiana Aretina
Presentazione Profilo salute 2008 Valdichiana Aretina
 
Pp Symposium 250908
Pp Symposium 250908Pp Symposium 250908
Pp Symposium 250908
 
Cud Program Overview 2010
Cud Program Overview 2010Cud Program Overview 2010
Cud Program Overview 2010
 
Index nominum to ontology
Index nominum to ontologyIndex nominum to ontology
Index nominum to ontology
 
Building the climate movement online
Building the climate movement onlineBuilding the climate movement online
Building the climate movement online
 
Ucm Transcript
Ucm TranscriptUcm Transcript
Ucm Transcript
 
Pwrslide1
Pwrslide1Pwrslide1
Pwrslide1
 
LITA Instructional Technologies IG - Presentation at MW Philly 2014
LITA Instructional Technologies IG - Presentation at MW Philly 2014LITA Instructional Technologies IG - Presentation at MW Philly 2014
LITA Instructional Technologies IG - Presentation at MW Philly 2014
 
Chi ha davvero bisogno di una copia privata?
Chi ha davvero bisogno di una copia privata?Chi ha davvero bisogno di una copia privata?
Chi ha davvero bisogno di una copia privata?
 
plano de trabalho
plano de trabalhoplano de trabalho
plano de trabalho
 
Business Coaching done Right!
Business Coaching done Right!Business Coaching done Right!
Business Coaching done Right!
 
Alucina
AlucinaAlucina
Alucina
 
El Aguila
El AguilaEl Aguila
El Aguila
 
Social Networking
Social NetworkingSocial Networking
Social Networking
 
Presentación Expansion TI
Presentación Expansion TIPresentación Expansion TI
Presentación Expansion TI
 
Abu Hanifah2
Abu Hanifah2Abu Hanifah2
Abu Hanifah2
 
Research2.0 by POSTECH Library
Research2.0 by POSTECH LibraryResearch2.0 by POSTECH Library
Research2.0 by POSTECH Library
 
Diaporama
DiaporamaDiaporama
Diaporama
 
何謂三聚氰胺
何謂三聚氰胺何謂三聚氰胺
何謂三聚氰胺
 

Semelhante a Open minted content_provision

Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...Jenny Mitcham
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
 
A collaborative approach to filling the digital preservation gap for RDM
A collaborative approach to filling the digital preservation gap for RDMA collaborative approach to filling the digital preservation gap for RDM
A collaborative approach to filling the digital preservation gap for RDMnortherncollaboration
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc RDM
 
Efficient and effective data management for ILRI research projects: A holisti...
Efficient and effective data management for ILRI research projects: A holisti...Efficient and effective data management for ILRI research projects: A holisti...
Efficient and effective data management for ILRI research projects: A holisti...ILRI
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 

Semelhante a Open minted content_provision (20)

Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Text Mining
Text MiningText Mining
Text Mining
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...
 
A collaborative approach to filling the digital preservation gap for RDM
A collaborative approach to filling the digital preservation gap for RDMA collaborative approach to filling the digital preservation gap for RDM
A collaborative approach to filling the digital preservation gap for RDM
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
 
Efficient and effective data management for ILRI research projects: A holisti...
Efficient and effective data management for ILRI research projects: A holisti...Efficient and effective data management for ILRI research projects: A holisti...
Efficient and effective data management for ILRI research projects: A holisti...
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 

Último

Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)
Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)
Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)thapagita
 
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂Escorts
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂EscortsTrusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂Escorts
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂EscortsLipikasharma29
 
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts Service
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts ServiceCall Girls In New Delhi Railway Station 9667422720 Top Quality Escorts Service
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts ServiceLipikasharma29
 
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Service
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts ServiceBook Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Service
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Servicemonikaservice1
 
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhimonikaservice1
 
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...Lipikasharma29
 
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncrthapariya601
 
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...Call Girls In saket 9711800081 Low Rate Short 1500 Night ...
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...gitathapa4
 
FULL ENJOY Call Girls In Gurgaon Call 8588836666 Escorts Service
FULL ENJOY Call Girls In Gurgaon  Call 8588836666 Escorts ServiceFULL ENJOY Call Girls In Gurgaon  Call 8588836666 Escorts Service
FULL ENJOY Call Girls In Gurgaon Call 8588836666 Escorts ServiceCALLGIRLS DELHI
 
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)ayushiverma1100
 
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCR
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCRCall Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCR
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCRsafdarjungdelhi1
 
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Service
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts ServiceCall Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Service
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Servicemonikaservice1
 
Call Girl In Malviya Nagar Delhi 9711800081 Escort Service
Call Girl In Malviya Nagar Delhi 9711800081  Escort ServiceCall Girl In Malviya Nagar Delhi 9711800081  Escort Service
Call Girl In Malviya Nagar Delhi 9711800081 Escort Servicegitathapa4
 
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)ayushiverma1100
 
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCRthapariya601
 
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCRthapariya601
 
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Service
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts ServiceJustdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Service
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Servicemonikaservice1
 
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncrthapariya601
 
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncrthapariya601
 
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCR
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCRCall Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCR
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCRsafdarjungdelhi1
 

Último (20)

Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)
Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)
Call Us ☎97110√14705🔝 Call Girls In Mandi House (Delhi NCR)
 
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂Escorts
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂EscortsTrusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂Escorts
Trusted Call~Girls In Rohini Delhi꧁❤ 9667422720 ❤꧂Escorts
 
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts Service
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts ServiceCall Girls In New Delhi Railway Station 9667422720 Top Quality Escorts Service
Call Girls In New Delhi Railway Station 9667422720 Top Quality Escorts Service
 
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Service
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts ServiceBook Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Service
Book Call Girls In Gurgaon Sector 29 Call 8800357707 Escorts Service
 
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi
8800357707, Munirka Metro Good Looking For Call Girls And Escort Service Delhi
 
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...
▶ ●─Hookup Call Girls In Noida Sector 137 (Noida) ⎝9667422720⎠ Delhi Female E...
 
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Moti Nagar Delhi Ncr
 
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...Call Girls In saket 9711800081 Low Rate Short 1500 Night ...
Call Girls In saket 9711800081 Low Rate Short 1500 Night ...
 
FULL ENJOY Call Girls In Gurgaon Call 8588836666 Escorts Service
FULL ENJOY Call Girls In Gurgaon  Call 8588836666 Escorts ServiceFULL ENJOY Call Girls In Gurgaon  Call 8588836666 Escorts Service
FULL ENJOY Call Girls In Gurgaon Call 8588836666 Escorts Service
 
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Laxmi Nagar (Delhi)
 
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCR
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCRCall Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCR
Call Us ➥9911191017▻Young Call Girls In Guru Dronacharya Metro Station Delhi NCR
 
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Service
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts ServiceCall Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Service
Call Girls In Sector 62, Noida꧁❤ 8800357707 ❤꧂Top Quality Escorts Service
 
Call Girl In Malviya Nagar Delhi 9711800081 Escort Service
Call Girl In Malviya Nagar Delhi 9711800081  Escort ServiceCall Girl In Malviya Nagar Delhi 9711800081  Escort Service
Call Girl In Malviya Nagar Delhi 9711800081 Escort Service
 
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)
Call Us ≽ 9643900018 ≼ Call Girls In Dwarka Sector 7 (Delhi)
 
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Paschim Vihar Delhi NCR
 
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR
9643097474 Full Enjoy @24/7 Call Girls in Saket Metro Delhi NCR
 
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Service
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts ServiceJustdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Service
Justdial Call Girls In Vaishali, Ghaziabad 8800357707 Escorts Service
 
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Aerocty Delhi Ncr
 
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr
9643097474 Full Enjoy @24/7 Call Girls In Khirki Extension Delhi Ncr
 
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCR
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCRCall Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCR
Call Girls In Lajpat Nagar Delhi➥9911191017 High Class Escorts In 24/7 Delhi NCR
 

Open minted content_provision

  • 1. Content Provision Lucas Anastasiou The Open University, Knowledge Media Institute 18-06-2015 Rhodes, Greece, 2015
  • 2. A simple text mining exercise Average length of dissertations (doctorates and master thesis) by major 3037 records from University of Minnesota https://beckmw.wordpress.com/2014/07/15/average-dissertation-and-thesis-length-take-two/
  • 3. • R script to scrape pdfs from Institutional Repository • Extract Text • Parse Text • Plot the data A simple text mining exercise
  • 4. The challenge of TDM • 90% of TDM project [1] is spent – Collecting data – Harmonising – Pre-processing data • Magnitude of data • Heterogeneity of data [1] Jisc Open Mirror Report Oct 2013
  • 5. The case of scientific literature • Identifying levels of access – Transactional information access – Analytical information access – Raw data (programatical) access – Google scholar estimated at 100 million papers
  • 6. 3 levels of access How the “big” guys are doing? Access type Google scholar MS Academic Research Transactional Browser interface (portal) Browser interface (portal) Analytical access Citation analysis, researcher profile Visualisations, citation analysis, authors connections Raw data access No API, scrapping possible (violation of ToC) Limited API, explicitly forbidden to download full corpus
  • 7. Other scholarly data sources Name of service Transactional Analytical Raw CiteseerX   PubMed  Arxiv   Scopus  Web of Science  SpringerLink  ☐  Elsevier  
  • 8. The case of Elsevier API • Sufficient for some tasks but not for all, no access to the full corpus • Restricting the usability of API, controlling the access • Potential loss of information (what you see in portal is different than API) • (may) require special dispensation from authors “It takes a lot of time and a lot of energy and doesn’t scale at all” Heather Piwowar
  • 9. Is this enough for the TDM community? • If you are a TDM-er you need to have true unrestricted programmatic level of access • APIs – Offer programatic access to individual articles – Offered in XML/JSON – Lack of standard schema, providers use proprietary formats – (may be) sufficient for some TDM tasks (e.g. information extraction)
  • 10. APIs not enough • Other family of TDM tasks require access to the full corpus – E.g. recommender systems, text summarisation • Need to have access to complete collection of articles • Data dumps
  • 11. Data dumps not enough • Even if you can access whole corpus you would need special hardware resources • Arxiv.org data dump compressed: 300Gb – How do you get it? – Where to store it? (*) – How to analyse it? – How to disseminate your findings? – In what format? – How can someone else verify your findings? Research should be reproducible !
  • 12. Non-technological barriers • Legal uncertainty • Copyright, database rights, licensing • (some) publishers require special dispensation • Skills gap • Researchers lack of awareness of TDM potential
  • 13. Summary • Collecting data is a tedious and time- consuming task, perhaps impossible • Scientific literature lacks programmatic level of access • APIs and Data dumps (though nice) not enough • Other barriers
  • 14. openMinTed vision • Data and algorithms in one place • Interoperable framework • “Safe” environment (legal status) • Trusted environment

Notas do Editor

  1. Show an example of a simple / minimalistic – perhaps useless- piece of TDM task => figure out the length (very simple metric) of academic dissertations and classify them by research area (major)
  2. Transactional -> through portal -> researchers, students Analytical -> funders, government, business intelligence Raw -> developers, Digital libraries, companies
  3. MS-AR : representations of data but not the data itself
  4. Either aggregators, ✓✗ Verify the ticks if correct
  5. Some people even suggest that API loses information
  6. * Though it certainly fits in a laptop disk nowadays, it is not sustainable(?) to expect from everyone to get a copy of this data and run locally his own processing , there is a need for something more central and reproducible
  7. Legal uncertainty: e.g. scraping is a violation of ToC of most, but under UK law if you are allowed to view a content through a browser then you are allowed to crawl it (provided that you are not compromisingprovider’s infrastructure) Copyright: e.g. arxiv provides as a dump only those articles with a default arxiv ilcense, vague description of what you can do with the processed infromation Special dispensation e.g. Elsevier Skills gap: it would be hard to expect from a (traditional) historian to write his own R scripts to dig historical text and extract perhaps important historical information BUT there is a HUGE potential if he decides to use TDM Researchers lack of awareness of TDM potential -> our responsibility as TDM community to demonstrate to the rest of the scientific community this potential Skills gap (2) : even for sciences close to TDM (computer scientists, mathematicians, statistians) TDM has a high barrier that you need to overcome if you want to do something useful
  8. Each data source has its own way to access No legal guarantees (you may doing illegal stuff)