SlideShare a Scribd company logo
1 of 36
THE CHALLENGE OF
DEEPER KNOWLEDGE
GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM
CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
OUTLINE
▸Research Performance
▸Knowledge Graphs
▸Research as a low resource domain
▸Quality
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
WHY?
INFORMATION OVERLOAD
WHY?
IN PRACTICE
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017).
Searching Data: A Review of Observational Data Retrieval Practices.
arXiv preprint arXiv:1707.06937.
Some observations from @gregory_km
survey & interviews :
• The needs and behaviors of specific user groups (e.g.
early career researchers, policy makers, students) are
not well documented.
• Participants require details about data collection and
handling
• Reconstructing data tables from journal articles,
using general search engines, and making direct
data requests are common.
K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018).
Understanding Data Retrieval Practices: A Social Informatics Perspective.
arXiv preprint arXiv:1801.04971
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ANSWERS ARE ABOUT THINGS, NOT JUST WORKS
Why shouldn’t a search on an author return information
about the author, including the author’s works? Where
was the author born, when did she live, what is she
known for? … All of this is possible, but only if we can
make some fundamental changes in our approach to
bibliographic description. ... The challenge for us lies in
transforming what we can of our data into
interrelated “things” without overindulging that
metaphor.
Coyle, K. (2016). FRBR, before and after: a look at our bibliographical
models. Chicago: ALA Editions.
ENTER
KNOWLEDGE
GRAPHS
ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY-
AWARE SEARCH, ANALYTICS AND EXPLORATION
PLATFORM FOR HEALTH AND LIFE SCIENCES."
PROCEEDINGS OF ACL-2016 SYSTEM
DEMONSTRATIONS (2016): 19-24.
Knowledge Graphs: The
Science System
Knowledge Graphs:
Curated Databases
From: Wikidata as a semantic framework for the Gene Wiki initiative
Database (Oxford). 2016;2016. doi:10.1093/database/baw015
RESEARCH IS
DIVERSE
http://knowescape.org/map-of-science-an-update/
15
Augenstein, Isabelle, et al. "SemEval 2017 Task 10:
ScienceIE-Extracting Keyphrases and Relations from
Scientific Publications." Proceedings of the 11th
International Workshop on Semantic Evaluation
(SemEval-2017). 2017.
SCIENTIFIC TEXT IS CHALLENGING
UNSUPERVISED & DISTANT SUPERVISION
EXAMPLE: UNIVERSAL SCHEMAS AND REVERB
Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf
• Successful in predicting new triples
(F1 =~ .7)
• ReVerb’s relations very interesting,
but recall very low
• Was not domain independent
• Matched arguments against a
medical ontology to improve
precision
• Predicted relations were restricted
to relation types from the same
ontology
OPEN INFORMATION EXTRACTION IN SCIENCE IS
HARD
Open Information Extraction on Scientific Text: An Evaluation.
Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING
2018
Example:
“The patient was treated with Emtricitabine,
Etravirine, and Darunavir”
‣ (The patient :: was treated with :: Emtricitabine,
Etravirine, and Darunavir)
Another possible extraction is:
‣ (The patient :: was treated with :: Emtricitabine)
‣ (The patient :: was treated with :: Etravirine)
‣ (The patient :: was treated with :: Darunavir)
698 unique relation types – 400 relation types
CROWDS ARE NOT EXPERTS
Use of Internal Testing Data to Help Determine Compensation for
Crowdsourcing Tasks
Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
TRANSFER LEARNING
Sujit Pal @ Elsevier Labs
TRANSFER LEARNING & MACHINE DEPENDENCIES
QUALITY IS
DEPENDENT
ON SOURCES
PROVENANCE
SOURCES AREN’T JUST DATA
Lauruhn, Michael, and Paul Groth. "Sources of
Change for Modern Knowledge Organization
Systems." Knowledge Organization 43, no. 8
(2016).
A MORE TRANSPARENT SUPPLY CHAIN
Groth, Paul, "Transparency and Reliability in the Data Supply
Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March-
April 2013 doi: 10.1109/MIC.2013.41
1) https://www.elsevier.com/connect/how-elsevier-is-breaking-down-barriers-
to-reproducibility
REPRODUCIBILITY AS QUALITY?
QUALITY AS MORE AUTOMATION
http://blog.booleanbiotech.com/genetic_engine
ering_pipeline_python.html “There are some catches too of course,
especially since it's very early in the
evolution of these tools. If it were the
internet it would be around 1994”
RESEARCH QUESTIONS
1. Does basic lab-based
biomedical research reuse
and assemble existing
methods, or is it primarily
focused on the development
of new techniques?
2. What existing methods are
covered by robotic labs?
RESULTS
DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN
ACTIONS
http://www.researchobject.orghttps://smart-api.info
CONCLUSIONS
▸Knowledge Graphs are crucial for overcoming information overload in research
▸Research has less redundancy than other domains
▸less resources and high diversity
▸challenge: effectively use general knowledge in these domains
▸Quality is central
▸turn towards processes and reproducibility as foundations

More Related Content

What's hot

Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 

What's hot (20)

Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 

Similar to The Challenge of Deeper Knowledge Graphs for Science

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
 
Mitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceMitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceLynn Connaway
 
Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Simon Knight
 
The Landscape of Citizen Science
The Landscape of Citizen ScienceThe Landscape of Citizen Science
The Landscape of Citizen ScienceDarlene Cavalier
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesAndrea Scharnhorst
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04jodischneider
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLourdes Verdes-Montenegro
 
Love for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLove for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLourdes Verdes-Montenegro
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data LifecycleRichard Ferrers
 
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Matthew Nisbet
 
Love for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLove for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLourdes Verdes-Montenegro
 
The future of scholarly publishing
The future of scholarly publishingThe future of scholarly publishing
The future of scholarly publishingBjörn Brembs
 
Watching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesWatching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesHazel Hall
 
Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative DataDavin Marcus Raja
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationciakov
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...Eugen Glavan
 

Similar to The Challenge of Deeper Knowledge Graphs for Science (20)

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
 
Mitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceMitigating microaggressions in virtual reference
Mitigating microaggressions in virtual reference
 
Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...
 
The Landscape of Citizen Science
The Landscape of Citizen ScienceThe Landscape of Citizen Science
The Landscape of Citizen Science
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studies
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 version
 
Love for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLove for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 update
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data Lifecycle
 
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
 
Love for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLove for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA version
 
The future of scholarly publishing
The future of scholarly publishingThe future of scholarly publishing
The future of scholarly publishing
 
Watching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesWatching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplaces
 
Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative Data
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovation
 
Sci am 10.2014
Sci am 10.2014Sci am 10.2014
Sci am 10.2014
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...
Scientific Literacy, Attitudes towards Science, Religiosity and Superstitious...
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (13)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

The Challenge of Deeper Knowledge Graphs for Science

  • 1. THE CHALLENGE OF DEEPER KNOWLEDGE GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
  • 3. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 4. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 5. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 6. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 7. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 9. WHY? IN PRACTICE Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey & interviews : • The needs and behaviors of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. • Participants require details about data collection and handling • Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common. K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018). Understanding Data Retrieval Practices: A Social Informatics Perspective. arXiv preprint arXiv:1801.04971
  • 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  • 11. ENTER KNOWLEDGE GRAPHS ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY- AWARE SEARCH, ANALYTICS AND EXPLORATION PLATFORM FOR HEALTH AND LIFE SCIENCES." PROCEEDINGS OF ACL-2016 SYSTEM DEMONSTRATIONS (2016): 19-24.
  • 13. Knowledge Graphs: Curated Databases From: Wikidata as a semantic framework for the Gene Wiki initiative Database (Oxford). 2016;2016. doi:10.1093/database/baw015
  • 15. 15 Augenstein, Isabelle, et al. "SemEval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications." Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017. SCIENTIFIC TEXT IS CHALLENGING
  • 16. UNSUPERVISED & DISTANT SUPERVISION EXAMPLE: UNIVERSAL SCHEMAS AND REVERB Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf • Successful in predicting new triples (F1 =~ .7) • ReVerb’s relations very interesting, but recall very low • Was not domain independent • Matched arguments against a medical ontology to improve precision • Predicted relations were restricted to relation types from the same ontology
  • 17. OPEN INFORMATION EXTRACTION IN SCIENCE IS HARD Open Information Extraction on Scientific Text: An Evaluation. Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING 2018 Example: “The patient was treated with Emtricitabine, Etravirine, and Darunavir” ‣ (The patient :: was treated with :: Emtricitabine, Etravirine, and Darunavir) Another possible extraction is: ‣ (The patient :: was treated with :: Emtricitabine) ‣ (The patient :: was treated with :: Etravirine) ‣ (The patient :: was treated with :: Darunavir) 698 unique relation types – 400 relation types
  • 18. CROWDS ARE NOT EXPERTS Use of Internal Testing Data to Help Determine Compensation for Crowdsourcing Tasks Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
  • 19. TRANSFER LEARNING Sujit Pal @ Elsevier Labs
  • 20. TRANSFER LEARNING & MACHINE DEPENDENCIES
  • 23. SOURCES AREN’T JUST DATA Lauruhn, Michael, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  • 24. A MORE TRANSPARENT SUPPLY CHAIN Groth, Paul, "Transparency and Reliability in the Data Supply Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March- April 2013 doi: 10.1109/MIC.2013.41
  • 26. QUALITY AS MORE AUTOMATION
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. http://blog.booleanbiotech.com/genetic_engine ering_pipeline_python.html “There are some catches too of course, especially since it's very early in the evolution of these tools. If it were the internet it would be around 1994”
  • 32.
  • 33. RESEARCH QUESTIONS 1. Does basic lab-based biomedical research reuse and assemble existing methods, or is it primarily focused on the development of new techniques? 2. What existing methods are covered by robotic labs?
  • 35. DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN ACTIONS http://www.researchobject.orghttps://smart-api.info
  • 36. CONCLUSIONS ▸Knowledge Graphs are crucial for overcoming information overload in research ▸Research has less redundancy than other domains ▸less resources and high diversity ▸challenge: effectively use general knowledge in these domains ▸Quality is central ▸turn towards processes and reproducibility as foundations

Editor's Notes

  1. Work with dans Reviewed 400 papers deep dive 114
  2. Cloud based labs provide remote access to frequently used experimental equipment Able to support increasingly complex protocols (e.g. transcriptic.com , emerald cloud lab)