SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
University of Economics                                                            Czech Technical University
             Prague                                                                             in Prague



           Recognizing, Classifying and Linking
           Entities with Wikipedia and DBpedia

                                                  Milan Dojchinovski1, Tomas Kliegr2
1 Faculty of Information Technology                                                 2Faculty
                                                                                           of Informatics and Statistics
Czech Technical University in Prague                                                 University of Economics, Prague


                                                                Milan Dojchinovski
                              milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk



                                            The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012)
                                                                                        November 22-23, 2012, Smolenice, SK

 Except where otherwise noted, the content of this presentation is licensed under
 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   2
Introduction

 ‣    Unsupervised and fully-automated:
  -    entity recognition - rule based lexico-syntactic patterns
  -    entity classification by extraction of hypernyms - targeted hypernym extraction
  -    entity linking to DBpedia concepts

 ‣    Publication as Linked Data
  -    results in NLP Interchange Format (NIF)




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   3
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   4
Tool Architecture

 ‣   Available as Web 2.0 application at: http://ner.vse.cz/thd

 ‣   Web API available at: http://ner.vse.cz/thd/docs




                                                          Fig 1. Architecture overview




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   5
Entity Recognition and Classification

 ‣    Entity Recognition
  -    2 JAPE grammars: 1) NNP+ 2) JJ* NN+
  -    input: free text
  -    output: Named (e.g., Diego Maradona ) or Common Entities (e.g., hockey player )

 ‣    Entity Classification
  -    supported by the Targeted Hypernym Discovery algorithm
  -    lexico-syntactic patterns, e.g. _x_ is a _y_




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   6
Entity Linking and Publication

 ‣    Entity Linking
  -    linking with concepts from DBpedia
  -    used Wikipedia Search API
  -    mapping Wikipedia article URL to its DBpedia representation

 ‣    Publication in NIF
  -    NLP Interchange Format (RDF-based representation)
  -    each processed document (context) has unique identifier
  -    each entity and hypernym as offset-based string




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   7
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   8
Experiments

 ‣   Question addressed
     -   How well our tool recognizes, classifies and links Named and Common Entities?
 ‣   Experiment setup
     -   manually created dataset, Czech Traveler Dataset
     -   101 Named Entities, 85 Common Entities
     -   comparison with 3 other systems: DBpedia Spotlight, Open Calais, Alchemy API
 ‣   Results
     -   Named Entities,
         •   f-score: recognition 0.66, classification 0.66, linking 0.58

     -   Common Entities
         •   f-score: recognition 0.60, classification 0.51, linking 0.61

     -   better results in all tasks
         •   overtaken only by DBpedia Spotlight - linking of common entities with f-score 0.69


Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   9
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   10
Conclusion and Future Work

 ‣   Tool for Entity Recognition, Classification and Publication

 ‣   Future directions
     -   multilingual support - Dutch, German and Czech language
     -   grammar improvements
     -   evaluation on a standard benchmark




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   11
Feedback




                                                               Thank you!
                                             Questions, comments, ideas?


                                          demo at: http://ner.vse.cz/thd

                            Milan Dojchinovski                                       @m1ci
                            milan.dojchinovski@fit.cvut.cz                            http://dojchinovski.mk

  Except where otherwise noted, the content of this presentation is licensed under
  Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported                                          12

Mais conteúdo relacionado

Semelhante a Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data PresentationShawn Day
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Anthony Fisher Camilleri
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayBaoxu Shi
 
DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME Stavros Zervoudakis
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked DataMilan Dojchinovski
 
Dariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingMinel Jean-Luc
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …Marc Smith
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Hristian Daskalov
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Frieda Brioschi
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceKai Eckert
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
 
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)IT Arena
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 

Semelhante a Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia (20)

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data Presentation
 
Lod2
Lod2Lod2
Lod2
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
 
DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked Data
 
Dariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displaying
 
Building arguments on Open Data
Building arguments on Open DataBuilding arguments on Open Data
Building arguments on Open Data
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
 
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

  • 1. University of Economics Czech Technical University Prague in Prague Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia Milan Dojchinovski1, Tomas Kliegr2 1 Faculty of Information Technology 2Faculty of Informatics and Statistics Czech Technical University in Prague University of Economics, Prague Milan Dojchinovski milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012) November 22-23, 2012, Smolenice, SK Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
  • 2. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 2
  • 3. Introduction ‣ Unsupervised and fully-automated: - entity recognition - rule based lexico-syntactic patterns - entity classification by extraction of hypernyms - targeted hypernym extraction - entity linking to DBpedia concepts ‣ Publication as Linked Data - results in NLP Interchange Format (NIF) Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 3
  • 4. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 4
  • 5. Tool Architecture ‣ Available as Web 2.0 application at: http://ner.vse.cz/thd ‣ Web API available at: http://ner.vse.cz/thd/docs Fig 1. Architecture overview Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 5
  • 6. Entity Recognition and Classification ‣ Entity Recognition - 2 JAPE grammars: 1) NNP+ 2) JJ* NN+ - input: free text - output: Named (e.g., Diego Maradona ) or Common Entities (e.g., hockey player ) ‣ Entity Classification - supported by the Targeted Hypernym Discovery algorithm - lexico-syntactic patterns, e.g. _x_ is a _y_ Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 6
  • 7. Entity Linking and Publication ‣ Entity Linking - linking with concepts from DBpedia - used Wikipedia Search API - mapping Wikipedia article URL to its DBpedia representation ‣ Publication in NIF - NLP Interchange Format (RDF-based representation) - each processed document (context) has unique identifier - each entity and hypernym as offset-based string Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 7
  • 8. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 8
  • 9. Experiments ‣ Question addressed - How well our tool recognizes, classifies and links Named and Common Entities? ‣ Experiment setup - manually created dataset, Czech Traveler Dataset - 101 Named Entities, 85 Common Entities - comparison with 3 other systems: DBpedia Spotlight, Open Calais, Alchemy API ‣ Results - Named Entities, • f-score: recognition 0.66, classification 0.66, linking 0.58 - Common Entities • f-score: recognition 0.60, classification 0.51, linking 0.61 - better results in all tasks • overtaken only by DBpedia Spotlight - linking of common entities with f-score 0.69 Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 9
  • 10. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 10
  • 11. Conclusion and Future Work ‣ Tool for Entity Recognition, Classification and Publication ‣ Future directions - multilingual support - Dutch, German and Czech language - grammar improvements - evaluation on a standard benchmark Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 11
  • 12. Feedback Thank you! Questions, comments, ideas? demo at: http://ner.vse.cz/thd Milan Dojchinovski @m1ci milan.dojchinovski@fit.cvut.cz http://dojchinovski.mk Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported 12