SlideShare uma empresa Scribd logo
1 de 14
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto
Heiko Paulheim, Ponzetto
Heiko

1
Disclaimer
•

This presentation shows an idea
– after all, it says “position paper”
– We don't know if it works!
– (but we are quite confident)

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

2
Lists in Wikipedia
•

Wikipedia loves lists

•

As of June 2013, there are almost 600,000 list pages

•

Lists organize Wikipedia pages
– that correspond to DBpedia instances

•

Example:
– List of African-American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

3
Lists in Wikipedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

4
Lists in Wikipedia
•

Different types of lists
– simple bullet point lists
– broken bullet point lists (i.e., different sections)
• sometimes, the sections are semantically meaningful
– tables
– ...

Simple Bullet List
Broken Bullet List
Table
Other

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

5
Lists in Wikipedia
•

What information is in a list?
– the linked things have the same “type”

•

The type can be a complex construct
– e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American}

•

Sometimes, there are more information bits
– e.g., birth dates for persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

6
Extracting Information from Lists
•

Goal:
– find the common characteristics of all things in the list

•

Example: African-American writers
– all instances are writers

25%

– all instances have nationality=United_States
– all instances have ethnicity=African_American

•

12%
3%

Information in DBpedia is far from complete
– makes extraction difficult
– but: big potential to add information to DBpedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

7
Extracting Information from Lists
•

Possible approach: finding characteristics with high TF-IDF
– TF: percentage of instances in the list that carry characteristic
– IDF: 1 / (percentage of all DBpedia instances that carry characteristic)

•

Rationale: only going by frequency would rate owl:Thing the highest

•

Example: African-American writers
– type=Writer: 0.608 (maximal across all possible classes)
– nationality=United_States: 0.277
– ethnicity=African_American: 0.127

•

But:
– deathPlace=New_York_City: 0.157 :-(

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

8
Extracting Information from Lists
•

Example: African-American writers
– ethnicity=African_American: 0.127
– deathPlace=New_York_City: 0.157

•

Exploit further information from list page
– e.g., wiki:African_American is linked from page, New_York_City is not
– e.g., analyze list page title, e.g., using DBpedia Spotlight
• African_American is recognized as an entity

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

9
Lists of Lists in Wikipedia
•

Wikipedia also knows ~600 lists of lists
– organize lists
– form a hierachy

•

E.g.:
– Lists of Writers
– Lists of American writers
– List of African American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

10
From Lists of Lists to an Extended Ontology
•

Idea:
– find corresponding lists of... pages for DBpedia classes
– extend hierarchy
owl:Thing
...

Agent
...

Person

Corresponding Wikipedia page:

Artist

...
DBpedia Ontology

...

Extended Ontology ...

Lists of Writers

Writer

African-American Writer

10/22/13

Lists of American Writers

American Writer
...

List of African-American Writers

Heiko Paulheim, Simone Paolo Ponzetto

11
Potential of the Idea
•

Given that we extract everything correctly from
List of African American writers, we get
– 814 new type statements (only DBpedia ontology)
– 1409 new property assertions
– two entirely new instances

•

...and there are ~600,000 list pages
– extrapolation: we can roughly double the information in DBpedia

•

many list pages contain extra information
– e.g., birth places and birth dates of persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

12
Challenges
•

Robust extraction of instances
– from different kinds of list pages
– e.g., picking the right column in a table
– tables and bullet point lists already make for 75%

•

Picking good scoring functions
– TF-IDF seems not bad at first glance

•

Combining statistical and textual evidence

•

Scalable implementation
– Advantage: perfectly parallelizable

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

13
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Christian Bizer
Heiko Paulheim, Simone Paolo Ponzetto
Heiko

14

Mais conteúdo relacionado

Mais procurados

Heritage University Newspaper Resources
Heritage University Newspaper ResourcesHeritage University Newspaper Resources
Heritage University Newspaper ResourcesRonald Hodge
 
Digital Library exploration evaluation
Digital Library exploration evaluationDigital Library exploration evaluation
Digital Library exploration evaluationSusan Kelly
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewJudith Ahronheim
 
Part V Documenting Your Sources
Part V Documenting Your SourcesPart V Documenting Your Sources
Part V Documenting Your SourcesJean Reynolds
 
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...Nebraska Library Commission
 
RDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityRDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityAJL2011
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011echeneyl
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industrygulab sharma
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harderuwlibeo
 
Custom source types
Custom source typesCustom source types
Custom source typesCarole Riley
 
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Emily Nimsakont
 
Getting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionGetting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionMay Chan
 

Mais procurados (20)

Another one like this
Another one like thisAnother one like this
Another one like this
 
English Postgraduates introduction to the library
English Postgraduates introduction to the libraryEnglish Postgraduates introduction to the library
English Postgraduates introduction to the library
 
Searching Workshop
Searching WorkshopSearching Workshop
Searching Workshop
 
Heritage University Newspaper Resources
Heritage University Newspaper ResourcesHeritage University Newspaper Resources
Heritage University Newspaper Resources
 
English Session 1: finding quality information for your course
English Session 1: finding quality information for your courseEnglish Session 1: finding quality information for your course
English Session 1: finding quality information for your course
 
Digital Library exploration evaluation
Digital Library exploration evaluationDigital Library exploration evaluation
Digital Library exploration evaluation
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright review
 
PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014
 
Find articles theatre 1313
Find articles theatre 1313Find articles theatre 1313
Find articles theatre 1313
 
Library resources
Library resourcesLibrary resources
Library resources
 
Part V Documenting Your Sources
Part V Documenting Your SourcesPart V Documenting Your Sources
Part V Documenting Your Sources
 
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
 
RDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityRDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging community
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industry
 
Mla citation
Mla citationMla citation
Mla citation
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harder
 
Custom source types
Custom source typesCustom source types
Custom source types
 
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
 
Getting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionGetting Started with Ancestry Library Edition
Getting Started with Ancestry Library Edition
 

Semelhante a Extending DBpedia with Wikipedia List Pages

The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanationArmando Castillo
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1lakehead1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301lakehead1
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sourcesmkwalsh55
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011Sue Bennett
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011Sue Bennett
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talkphibaa
 
Types of information sources module
Types of information sources moduleTypes of information sources module
Types of information sources moduleSharon Tyler
 
Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Christine Serrano
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2charlottefrost
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxdunnramage
 
Basic search skills training
Basic search skills trainingBasic search skills training
Basic search skills trainingCandy Yip
 
Research skills final revision
Research skills final revisionResearch skills final revision
Research skills final revisionHeba Bakry
 

Semelhante a Extending DBpedia with Wikipedia List Pages (20)

1 hf research_journey
1 hf research_journey1 hf research_journey
1 hf research_journey
 
1 Hf Research Journey
1 Hf Research Journey1 Hf Research Journey
1 Hf Research Journey
 
The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanation
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301
 
Wikimedia Workshop
Wikimedia WorkshopWikimedia Workshop
Wikimedia Workshop
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sources
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Writing a bibleography
Writing a bibleographyWriting a bibleography
Writing a bibleography
 
Research: Multicultural Education
Research: Multicultural EducationResearch: Multicultural Education
Research: Multicultural Education
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talk
 
Types of information sources module
Types of information sources moduleTypes of information sources module
Types of information sources module
 
Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docx
 
Basic search skills training
Basic search skills trainingBasic search skills training
Basic search skills training
 
Research skills final revision
Research skills final revisionResearch skills final revision
Research skills final revision
 

Mais de Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 

Mais de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 

Último

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Último (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Extending DBpedia with Wikipedia List Pages

  • 1. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto Heiko Paulheim, Ponzetto Heiko 1
  • 2. Disclaimer • This presentation shows an idea – after all, it says “position paper” – We don't know if it works! – (but we are quite confident) 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 2
  • 3. Lists in Wikipedia • Wikipedia loves lists • As of June 2013, there are almost 600,000 list pages • Lists organize Wikipedia pages – that correspond to DBpedia instances • Example: – List of African-American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 3
  • 4. Lists in Wikipedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 4
  • 5. Lists in Wikipedia • Different types of lists – simple bullet point lists – broken bullet point lists (i.e., different sections) • sometimes, the sections are semantically meaningful – tables – ... Simple Bullet List Broken Bullet List Table Other 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 5
  • 6. Lists in Wikipedia • What information is in a list? – the linked things have the same “type” • The type can be a complex construct – e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American} • Sometimes, there are more information bits – e.g., birth dates for persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 6
  • 7. Extracting Information from Lists • Goal: – find the common characteristics of all things in the list • Example: African-American writers – all instances are writers 25% – all instances have nationality=United_States – all instances have ethnicity=African_American • 12% 3% Information in DBpedia is far from complete – makes extraction difficult – but: big potential to add information to DBpedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 7
  • 8. Extracting Information from Lists • Possible approach: finding characteristics with high TF-IDF – TF: percentage of instances in the list that carry characteristic – IDF: 1 / (percentage of all DBpedia instances that carry characteristic) • Rationale: only going by frequency would rate owl:Thing the highest • Example: African-American writers – type=Writer: 0.608 (maximal across all possible classes) – nationality=United_States: 0.277 – ethnicity=African_American: 0.127 • But: – deathPlace=New_York_City: 0.157 :-( 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 8
  • 9. Extracting Information from Lists • Example: African-American writers – ethnicity=African_American: 0.127 – deathPlace=New_York_City: 0.157 • Exploit further information from list page – e.g., wiki:African_American is linked from page, New_York_City is not – e.g., analyze list page title, e.g., using DBpedia Spotlight • African_American is recognized as an entity 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 9
  • 10. Lists of Lists in Wikipedia • Wikipedia also knows ~600 lists of lists – organize lists – form a hierachy • E.g.: – Lists of Writers – Lists of American writers – List of African American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 10
  • 11. From Lists of Lists to an Extended Ontology • Idea: – find corresponding lists of... pages for DBpedia classes – extend hierarchy owl:Thing ... Agent ... Person Corresponding Wikipedia page: Artist ... DBpedia Ontology ... Extended Ontology ... Lists of Writers Writer African-American Writer 10/22/13 Lists of American Writers American Writer ... List of African-American Writers Heiko Paulheim, Simone Paolo Ponzetto 11
  • 12. Potential of the Idea • Given that we extract everything correctly from List of African American writers, we get – 814 new type statements (only DBpedia ontology) – 1409 new property assertions – two entirely new instances • ...and there are ~600,000 list pages – extrapolation: we can roughly double the information in DBpedia • many list pages contain extra information – e.g., birth places and birth dates of persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 12
  • 13. Challenges • Robust extraction of instances – from different kinds of list pages – e.g., picking the right column in a table – tables and bullet point lists already make for 75% • Picking good scoring functions – TF-IDF seems not bad at first glance • Combining statistical and textual evidence • Scalable implementation – Advantage: perfectly parallelizable 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 13
  • 14. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Christian Bizer Heiko Paulheim, Simone Paolo Ponzetto Heiko 14