SlideShare a Scribd company logo
1 of 26
Faculty of Science
DL4KG – ESWC 2019
June 2, 2019
End-to-End Learning for
Answering Structured Queries Directly over Text
Paul Groth (@pgroth), Antony Scerri, Ron Daniel, Jr., Bradley P. Allen
@INDE_LAB_AMS @ElsevierLabs
Faculty of Science
“An information need is the topic about which the user desires to know
more” – Manning
Information Needs
Faculty of Science
Data as an information need
 Researchers across communities need a diversity of
observational data, requiring data of different types, from
different sources and disciplines, and often collected at
different scales.
 Integrating diverse data is a challenge.
Gregory, K.; Cousijn, H.; Groth, P.; Scharnhorst, A.; Wyatt, S. (2019). Searching data: A review
of observational data retrieval practices in selected disciplines. Journal of the Association for
Information Science and Technology. https://doi.org/10.1002/asi.24165
Faculty of Science
Data search – is it just a regular search engine?
Survey of Research Challenges:
Adriane Chapman, Elena Simperl, Laura Koesten,
George Konstantinidis, Luis-Daniel Ibáñez-Gonzalez,
Emilia Kacprzak, Paul Groth (Jan 2019) "Dataset
search: a survey" https://arxiv.org/abs/1901.00735
Faculty of Science
Constructive Data Search
SmartTable: A Spreadsheet Program with Intelligent Assistance, S. Zhang,
V. A. Zada, and K. Balog. In: 41st International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR ’18), July 2018.
Faculty of Science
Integration of Data Into Workflows
Chichester, Christine, Daniela Digles, Ronald Siebes, Antonis Loizou, Paul Groth, and
Lee Harland. "Drug discovery FAQs: workflows for answering multidomain drug
discovery questions." Drug discovery today 20, no. 4 (2015): 399-405.
Faculty of Science
Run structured queries
Faculty of Science
https://kgtutorial.github.io
FIRST: BUILD A KNOWLEDGE GRAPH
Faculty of Science
FIRST: BUILD A KNOWLEDGE GRAPH
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
Concept
Resolution
14M
SD articles
475 M
triples
3.3 million
relations
49 M
relations
~15k ->
1M
entries
Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel
“Applying Universal Schemas for Domain Specific Ontology Expansion”
5th Workshop on Automated Knowledge Base Construction (AKBC) 2016
Michael Lauruhn, and Paul Groth. "Sources of Change for Modern
Knowledge Organization Systems." Knowledge Organization 43, no. 8
(2016).
Faculty of Science
Text Databases
Schneider, Rudolf, et al. "Interactive Relation Extraction in Main Memory Database
Systems." Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: System Demonstrations. 2016.
Faculty of Science
Can you skip all that?
Faculty of Science
Machine Comprehension + Question Answering Tasks
https://nlp.stanford.edu/software/sempre/wikitable/
Faculty of Science
What if we have a parallel corpora
Faculty of Science
Triple Pattern Fragments
http://linkeddatafragments.org/concept/
Faculty of Science
Now we only need to answer slot filling queries
WikiReading: A Novel Large-scale
Language Understanding Task over
Wikipedia, Hewlett, et al, ACL 2016
Constructing Datasets for Multi-hop Reading Comprehension
Across Documents, Johannes Welbl, Pontus
Stenetorp, Sebastian Riedel, Transactions of the Association
for Computational Linguistics 2018
Faculty of Science
Off the shelf QA architectures
Dirk Weissenborn, Georg Wiese, and Laura Seiffe. Making neural qa as simple as possible but
not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning
(CoNLL 2017), pages 271–280, 2017.
Tim Dettmers Isabelle Augenstein Johannes Welbl Tim Rocktaschel Matko
Bosnjak Jeff Mitchell Thomas Demeester Pontus Stenetorp Sebastian Riedel
Dirk Weissenborn, Pasquale Minervini. Jack the Reader – A Machine
Reading Framework. In Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (ACL) System Demonstrations,
July 2018. URL https://arxiv.org/abs/1806.08727
Jack the Reader – framework for machine reading
https://github.com/uclmr/jack
FastQA – state of the art baseline neural architecture
JackQA – architecture from framework
Faculty of Science
Training data
Question:
lexicalize(?city wdt:P131 wd:Q55) =>
located in the administrative territorial entity of
Netherlands
Input Text
“Amsterdam is the capital city and most populous
municipality of the Netherlands. ….”
Answer span
Amsterdam [0,9]
1150 predicates in Wikidata that link entities
Filter
 Subject must have a Wikipedia page
 > 30 examples
 Answer must be in the text
572 predicates
~300 examples per predicate
Faculty of Science
- Train a model per predicate
- 2/3 training 1/3 test
- Windowing scheme over the text of articles
- EC2 p2.xlarge
- 1 virtual GPU - NVIDIA K80, 4 virtual CPUs, 61 GiB RAM
- FastQA – 23 hours training time
- JackQA – 81 hours
- restarts to decrease batch sizes if model training failed
Training
Faculty of Science
Results
Faculty of Science
Training data size as a factor?
Faculty of Science
Faculty of Science
Faculty of Science
A Prototype
Faculty of Science
- Joint model
- Model architecture tuned to the task
- Performance on complex queries
- Accuracy
- Speed
- Other datasets
- When to use what approach
- …
Where to go
Faculty of Science
• Structured queries are important!
• Can we do it on text? Looks like it … kind of
• Text as the KB – McCallum
• Interested in this kind of stuff?
• We’re hiring!
Questions?
Paul Groth | @pgroth | pgroth.com
indelab.org
Conclusion
Faculty of Science

More Related Content

What's hot

From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Data science and privacy regulation
Data science and privacy regulationData science and privacy regulation
Data science and privacy regulationblogzilla
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 

What's hot (20)

From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Data science and privacy regulation
Data science and privacy regulationData science and privacy regulation
Data science and privacy regulation
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 

Similar to End-to-End Learning for Answering Structured Queries Directly over Text

Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
TOP READ NATURAL LANGUAGE COMPUTING ARTICLE 2020
TOP READ NATURAL LANGUAGE  COMPUTING ARTICLE 2020TOP READ NATURAL LANGUAGE  COMPUTING ARTICLE 2020
TOP READ NATURAL LANGUAGE COMPUTING ARTICLE 2020kevig
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederOII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederEric Meyer
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureAndrea Wiggins
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...DataScienceConferenc1
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
 

Similar to End-to-End Learning for Answering Structured Queries Directly over Text (20)

Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
TOP READ NATURAL LANGUAGE COMPUTING ARTICLE 2020
TOP READ NATURAL LANGUAGE  COMPUTING ARTICLE 2020TOP READ NATURAL LANGUAGE  COMPUTING ARTICLE 2020
TOP READ NATURAL LANGUAGE COMPUTING ARTICLE 2020
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederOII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science Cyberinfrastructure
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (12)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

End-to-End Learning for Answering Structured Queries Directly over Text

  • 1. Faculty of Science DL4KG – ESWC 2019 June 2, 2019 End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth (@pgroth), Antony Scerri, Ron Daniel, Jr., Bradley P. Allen @INDE_LAB_AMS @ElsevierLabs
  • 2. Faculty of Science “An information need is the topic about which the user desires to know more” – Manning Information Needs
  • 3. Faculty of Science Data as an information need  Researchers across communities need a diversity of observational data, requiring data of different types, from different sources and disciplines, and often collected at different scales.  Integrating diverse data is a challenge. Gregory, K.; Cousijn, H.; Groth, P.; Scharnhorst, A.; Wyatt, S. (2019). Searching data: A review of observational data retrieval practices in selected disciplines. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.24165
  • 4. Faculty of Science Data search – is it just a regular search engine? Survey of Research Challenges: Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez-Gonzalez, Emilia Kacprzak, Paul Groth (Jan 2019) "Dataset search: a survey" https://arxiv.org/abs/1901.00735
  • 5. Faculty of Science Constructive Data Search SmartTable: A Spreadsheet Program with Intelligent Assistance, S. Zhang, V. A. Zada, and K. Balog. In: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’18), July 2018.
  • 6. Faculty of Science Integration of Data Into Workflows Chichester, Christine, Daniela Digles, Ronald Siebes, Antonis Loizou, Paul Groth, and Lee Harland. "Drug discovery FAQs: workflows for answering multidomain drug discovery questions." Drug discovery today 20, no. 4 (2015): 399-405.
  • 7. Faculty of Science Run structured queries
  • 9. Faculty of Science FIRST: BUILD A KNOWLEDGE GRAPH Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 Michael Lauruhn, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  • 10. Faculty of Science Text Databases Schneider, Rudolf, et al. "Interactive Relation Extraction in Main Memory Database Systems." Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. 2016.
  • 11. Faculty of Science Can you skip all that?
  • 12. Faculty of Science Machine Comprehension + Question Answering Tasks https://nlp.stanford.edu/software/sempre/wikitable/
  • 13. Faculty of Science What if we have a parallel corpora
  • 14. Faculty of Science Triple Pattern Fragments http://linkeddatafragments.org/concept/
  • 15. Faculty of Science Now we only need to answer slot filling queries WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia, Hewlett, et al, ACL 2016 Constructing Datasets for Multi-hop Reading Comprehension Across Documents, Johannes Welbl, Pontus Stenetorp, Sebastian Riedel, Transactions of the Association for Computational Linguistics 2018
  • 16. Faculty of Science Off the shelf QA architectures Dirk Weissenborn, Georg Wiese, and Laura Seiffe. Making neural qa as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 271–280, 2017. Tim Dettmers Isabelle Augenstein Johannes Welbl Tim Rocktaschel Matko Bosnjak Jeff Mitchell Thomas Demeester Pontus Stenetorp Sebastian Riedel Dirk Weissenborn, Pasquale Minervini. Jack the Reader – A Machine Reading Framework. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) System Demonstrations, July 2018. URL https://arxiv.org/abs/1806.08727 Jack the Reader – framework for machine reading https://github.com/uclmr/jack FastQA – state of the art baseline neural architecture JackQA – architecture from framework
  • 17. Faculty of Science Training data Question: lexicalize(?city wdt:P131 wd:Q55) => located in the administrative territorial entity of Netherlands Input Text “Amsterdam is the capital city and most populous municipality of the Netherlands. ….” Answer span Amsterdam [0,9] 1150 predicates in Wikidata that link entities Filter  Subject must have a Wikipedia page  > 30 examples  Answer must be in the text 572 predicates ~300 examples per predicate
  • 18. Faculty of Science - Train a model per predicate - 2/3 training 1/3 test - Windowing scheme over the text of articles - EC2 p2.xlarge - 1 virtual GPU - NVIDIA K80, 4 virtual CPUs, 61 GiB RAM - FastQA – 23 hours training time - JackQA – 81 hours - restarts to decrease batch sizes if model training failed Training
  • 20. Faculty of Science Training data size as a factor?
  • 23. Faculty of Science A Prototype
  • 24. Faculty of Science - Joint model - Model architecture tuned to the task - Performance on complex queries - Accuracy - Speed - Other datasets - When to use what approach - … Where to go
  • 25. Faculty of Science • Structured queries are important! • Can we do it on text? Looks like it … kind of • Text as the KB – McCallum • Interested in this kind of stuff? • We’re hiring! Questions? Paul Groth | @pgroth | pgroth.com indelab.org Conclusion

Editor's Notes

  1. Tons of challenges
  2. Why – because you want to be precise Problem – information extractioni
  3. Learn this end-to-end
  4. Indrex / inderl / deep dive
  5. Good performance / it’s about the data not the model F1 overlap in the extracted tokens and answers
  6. Value constraints Classificaiton systems Common words Syntactic ppatterns
  7. Single character answers Fuziness (administrative entity Very large sets