SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc.  Computer Science (2002-2007) MSc, PhD.  Chemistry (2008-date) Postdoc
It’s all Casey’s fault! Dr. Casey Bergman, Lecturer  Faculty of Life Sciences I  s  Citeulike.org! http://ukpmc.ac.uk/
[object Object]
Defrosting the Digital Library (in one slide) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metawhat? getMetadata getData ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Journal: PLoS Computational Biology Tell me more? What is it about? Where did it  come from?
Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of  Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
R epresenting  E vidence  F or  I nteracting  N etwork  E lements www.sbml.org  from  www.biomodels.net  database at the  EBI.ac.uk
Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
Synonyms from Pedro Mendes  B-Net Database http://www.comp-sys-bio.org/yeastnet/   Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate;  H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose;  grape sugar; Traubenzucker D-Glucose Synonyms Name
Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
For more info. www.nactem.ac.uk/refine   One of the biggest challenges is getting hold of accurate metadata from libraries and databases
But first… ,[object Object],[object Object],[object Object],[object Object]
[object Object],getMetadata getData 6 million+ “units” sold worldwide to date: america, europe, middle east, africa, australasia Lots of data, metadata and money! Owner’s handbook Tell me more? What is it about?
Final solution: Web XSLT Print
Summary: Lessons from Ford ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DATA METADATA
 
BBC Spooks? ,[object Object],[object Object],Keeping an eye on people around the world since 1939  Winston Churchill “ B ig  B ritish  C astle” (BBC)
I  hate powerpoint Radio MS Word TV
How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
Word:  Not  the best way to manage data and metadata
Getting Rid of Word database XML schema Web &  Intranet Printed documents XSLT
A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about  Thabo Mbeki Thabo Mbeki
Summary: Lessons from the BBC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How have libraries managed metadata? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Image via  http://en.wikipedia.org/wiki/Library_of_Alexandria
From  ~1824  until ~1989 Photos via dpicker  http://www.flickr.com/photos/dpicker/3107856991/  and pit yacker  http://www.flickr.com/photos/78825653@N00/131611136   JRULM (Main Library) Joule  Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
[object Object],Data Tightly bound (literally) Rarely separated First published 1687, over 300 years old
Data and metadata was like this for centuries! ,[object Object]
+ Tim Berners-Lee 1989
Timeline: Unchanged for centuries but… 20 years  ÷   2309 years  = <1%
Everything’s Gone Digital!  www.scopus.com www.pubmed.gov http://ukpmc.ac.uk   www. isiknowledge .com scholar.google.com
Digital Utopia? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Alexander Griekspoor www.mekentosj.com
Welcome to Digital Dystopia ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
Identity Crisis part 1: Which publication? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identity crisis part 2: Who are you?  Who, who … who, who? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Neil Smalheiser and Vetle Torvik Typo Attribution would seem to be a simple process and yet it represents a  major, unsolved problem   for information science. http://tinyurl.com/authorid
Identity crisis part 3: Mistaken Identity ,[object Object],Dr. Duncan Hull Humble Postdoc Article about Authored-by Authored-by Wrong! “ DNA mania” title http://tinyurl.com/mistakenid
Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know,  Title might be  “ defrosting…” Where did this  come from?
Can’t get metadata (decoupled from data): PDF ,[object Object],Why can't I manage  academic papers like MP3s? http: //tinyurl .com/mp3vpdf   James Howison, Carnegie Mellon University Data is tightly coupled to its metadata getMetadata getData Artist: The Who Title: Who Are You? Recorded: 1978 Album: Who Are You
Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger,  and we're trying to turn it  back into a cow.   http://tinyurl.com/pdfhamburger   Cow (structured data) publishing text-mining
Can’t get metadata (decoupled from data): HTTP ,[object Object]
Can’t get metadata (decoupled from data): HTTP ,[object Object],Tim Bray, Sun Microsystems One of the Web's distinguishing features  is that there's a big gaping hole  where the metadata ought to be. http://tinyurl.com/nometadata
I’ll stop moaning now ,[object Object],[object Object],[object Object],[object Object],[object Object]
www.citeulike.org   Richard Cameron Kevin Emamy Picture from  http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy  and  http://www.citeulike.org/faq/faq.adp   The reason I wrote the site [citeulike.org] was, after recently coming back to academia,  I was slightly shocked by the quality of some of the tools available to help academics  do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
Why should you care about citeulike? ,[object Object],[object Object]
All references in one place
Click Post to Citeulike
Tag it (optional)
Citeulike: Recoupling data and metadata ,[object Object]
Citegeist = Citeulike + Zeitgeist
allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
Where will citeulike break? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why should you bother with citeulike? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Casey Bergman story I was importing papers on solexa and 454  genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689   which was a real find in terms of convincing me  that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil  http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
Many  different  solutions e.g.  Papyro:  Steve  Pettifer http://utopia.cs.manchester.ac.uk/
And the rest… www.mendeley.com   www.zotero.org   www.connotea.org   www.mekentosj.com   www.hubmed.org   Re-couple metadata that has be de-coupled from data www.2collab.com   www.refworks.com   “ iTunes for PDF files”
There is still lots  more metadata How many times  has  http://pubmed.gov/19060304  been cited? Who has cited  http://pubmed.gov/19060304   ?  Give me all the references that cite this one Give me all the references cited by  http://pubmed.gov/19060304   Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304   Impact factor?
Digital Identity would solve  some  of these problems Give yourself a URI,  you deserve it! Tim Berners-Lee  http://www.w3.org/People/Berners-Lee/card#i see  http://dig.csail.mit.edu/breadcrumbs/node/71
URI’s for Douglas Kell ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.myopenid.com   www.openid.net   (Also Note researcher-id from thomson)
[object Object],Phil Bourne
[object Object],Science is  public  knowledge http://tinyurl.com/publicknowledge
Conclusions: What hasn’t changed ,[object Object],[object Object],[object Object],[object Object]
Conclusions: Publication metadata matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions: Scientists are too blasé about metadata! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],metadata
Conclusions: Do us a favour!
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data, data, data
Data, data, dataData, data, data
Data, data, dataandrewxhill
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataJose Emilio Labra Gayo
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframeKai Li
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesespetermurrayrust
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Paul Bradshaw
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science dataARDC
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 

Mais procurados (20)

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data, data, data
Data, data, dataData, data, data
Data, data, data
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
Unknown Unknowns
Unknown UnknownsUnknown Unknowns
Unknown Unknowns
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science data
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 

Semelhante a Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Dorothea Salo
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...Boris Adryan
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? Dr. Haxel Consult
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxesta2310819
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsJeremy Frey
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration LectureSUNY Oneonta
 

Semelhante a Defrosting the Digital Library: A survey of bibliographic tools for the next generation web (20)

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptx
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart Labs
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration Lecture
 

Mais de Duncan Hull

Why study plants?
Why study plants?Why study plants?
Why study plants?Duncan Hull
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumDuncan Hull
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyDuncan Hull
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09Duncan Hull
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenIDDuncan Hull
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible ScientistDuncan Hull
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging DangerouslyDuncan Hull
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upDuncan Hull
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information managementDuncan Hull
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureDuncan Hull
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and Duncan Hull
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Duncan Hull
 

Mais de Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information management
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?
 

Último

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Último (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

  • 1. Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc. Computer Science (2002-2007) MSc, PhD. Chemistry (2008-date) Postdoc
  • 2. It’s all Casey’s fault! Dr. Casey Bergman, Lecturer Faculty of Life Sciences I s Citeulike.org! http://ukpmc.ac.uk/
  • 3.
  • 4.
  • 5.
  • 6. Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
  • 7. R epresenting E vidence F or I nteracting N etwork E lements www.sbml.org from www.biomodels.net database at the EBI.ac.uk
  • 8. Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
  • 9. Synonyms from Pedro Mendes B-Net Database http://www.comp-sys-bio.org/yeastnet/ Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate; H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose; grape sugar; Traubenzucker D-Glucose Synonyms Name
  • 10. Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
  • 11. For more info. www.nactem.ac.uk/refine One of the biggest challenges is getting hold of accurate metadata from libraries and databases
  • 12.
  • 13.
  • 14. Final solution: Web XSLT Print
  • 15.
  • 16.  
  • 17.
  • 18. I hate powerpoint Radio MS Word TV
  • 19. How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
  • 20. Word: Not the best way to manage data and metadata
  • 21. Getting Rid of Word database XML schema Web & Intranet Printed documents XSLT
  • 22. A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about Thabo Mbeki Thabo Mbeki
  • 23.
  • 24.
  • 25. From ~1824 until ~1989 Photos via dpicker http://www.flickr.com/photos/dpicker/3107856991/ and pit yacker http://www.flickr.com/photos/78825653@N00/131611136 JRULM (Main Library) Joule Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
  • 26.
  • 27.
  • 29. Timeline: Unchanged for centuries but… 20 years ÷ 2309 years = <1%
  • 30. Everything’s Gone Digital! www.scopus.com www.pubmed.gov http://ukpmc.ac.uk www. isiknowledge .com scholar.google.com
  • 31.
  • 32.
  • 33. Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
  • 34.
  • 35.
  • 36.
  • 37. Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know, Title might be “ defrosting…” Where did this come from?
  • 38.
  • 39. Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger, and we're trying to turn it back into a cow. http://tinyurl.com/pdfhamburger Cow (structured data) publishing text-mining
  • 40.
  • 41.
  • 42.
  • 43. www.citeulike.org Richard Cameron Kevin Emamy Picture from http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy and http://www.citeulike.org/faq/faq.adp The reason I wrote the site [citeulike.org] was, after recently coming back to academia, I was slightly shocked by the quality of some of the tools available to help academics do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
  • 44.
  • 45. All references in one place
  • 46. Click Post to Citeulike
  • 48.
  • 49. Citegeist = Citeulike + Zeitgeist
  • 50. allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
  • 51.
  • 52.
  • 53. Casey Bergman story I was importing papers on solexa and 454 genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689 which was a real find in terms of convincing me that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
  • 54. Many different solutions e.g. Papyro: Steve Pettifer http://utopia.cs.manchester.ac.uk/
  • 55. And the rest… www.mendeley.com www.zotero.org www.connotea.org www.mekentosj.com www.hubmed.org Re-couple metadata that has be de-coupled from data www.2collab.com www.refworks.com “ iTunes for PDF files”
  • 56. There is still lots more metadata How many times has http://pubmed.gov/19060304 been cited? Who has cited http://pubmed.gov/19060304 ? Give me all the references that cite this one Give me all the references cited by http://pubmed.gov/19060304 Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304 Impact factor?
  • 57. Digital Identity would solve some of these problems Give yourself a URI, you deserve it! Tim Berners-Lee http://www.w3.org/People/Berners-Lee/card#i see http://dig.csail.mit.edu/breadcrumbs/node/71
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Conclusions: Do us a favour!
  • 65.