SlideShare a Scribd company logo
1 of 13
Download to read offline
Learning with the Web: SpottingLearning with the Web: Spotting
Named Entities on the intersectionNamed Entities on the intersection
of NERD and Machine Learningof NERD and Machine Learning
Marieke van Erp, Giuseppe Rizzo, Raphaël Troncy
@giusepperizzo
May 13, 2013 2/13Making Sense of Microposts (#MSM2013)
NERD-ML @ MSM'13
May 13, 2013 3/13Making Sense of Microposts (#MSM2013)
Preprocessing
➢
Dataset is converted in CoNLL IOB
format
➢
Applied 10 cross-fold validation
➢
Chunked the set of tweets in 50KB parts
in order to comply with NERD filesize
limitations
May 13, 2013 4/13Making Sense of Microposts (#MSM2013)
NERD extractors
➢
Retrieves named entities from 10 extractors (Web
APIs)
➢
Harmonizes the classification according to the
NERD Ontology v0.5
http://nerd.eurecom.fr/ontology
➢
75 entity classes mapped to 4 MSM'13 classes
http://nerd.eurecom.fr
May 13, 2013 5/13Making Sense of Microposts (#MSM2013)
Ritter et al. (2011)
➢
Off-the-shelf tool tailored to a Twitter
stream based on:
– LabelledLDA (+CRF)
– Textual features (POS,Capitalization,Suffix, etc.)
– Freebase gazetters (names of PER, ORG, LOC)
➢
10 entity classes mapped to 4 classes
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An
Experimental Study. In: Empirical Methods in Natural Language Processing
(EMNLP’11) (2011)
May 13, 2013 6/13Making Sense of Microposts (#MSM2013)
Stanford CRF
➢
Re-trained on the MSM'13 corpora
➢
Parameters based on
english.conll.4class.distsim.crf.ser.gz
properties file provided with the
Stanford distribution
➢
Baseline of our approach
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local
Information into Information Extraction Systems by Gibbs Sampling. In: 43nd Annual
Meeting of the Association for Computational Linguistics (ACL'05) (2005)
May 13, 2013 7/13Making Sense of Microposts (#MSM2013)
Textual features
➢
POS
➢
Capitalisation information
– initial capital
– all capitalized
– proportion of token capitals
➢
Prefix (first three letters of the token)
➢
Suffix (last three letters of the token)
➢
Whether token is at the beginning of at the
end of the micropost
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental
Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
May 13, 2013 8/13Making Sense of Microposts (#MSM2013)
ML settings
Run01: 7 textual features (POS, initial capital,
proportion of capitals, prefix, sufix, end/start token); 0
extractor; ML=k-NN, k =1, Euclidean distance
Run02: 0 textual feature; 12 extractors (AlchemyAPI,
DBpedia Spotlight, Extractiv, Lupedia, OpenCalais,
Saplo, Yahoo, Textrazor, Wikimeta, Zemanta,
Stanford NER, Ritter et al.); ML=SVM, polynomial
kernel, SMO
Run03: 4 textual features (POS, initial capital, suffix,
Proportion of Capitals); 8 extractors (AlchemyAPI,
DBpedia Spotlight, Extractiv, Opencalais, Textrazor,
Wikimeta, Stanford NER, Ritter et al.); ML=SVM,
polynomial kernel, SMO
May 13, 2013 9/13Making Sense of Microposts (#MSM2013)
Precision – MSM'13 training,
10 cross-fold validation
May 13, 2013 10/13Making Sense of Microposts (#MSM2013)
Recall - MSM'13 training,
10 cross-fold validation
May 13, 2013 11/13Making Sense of Microposts (#MSM2013)
F1 – MSM'13 training,
10 cross-fold validation
May 13, 2013 12/13Making Sense of Microposts (#MSM2013)
Lessons learned
➢
MISC class is ambiguously defined
➢
8.1% of the named entities from the
training data occurs in the test data
➢
Best Run03: not all extractors and some
textual features
➢
For the next challenge what about
entity linking?
May 13, 2013 13/13Making Sense of Microposts (#MSM2013)
Thanks for your time and attention
http://www.slideshare.net/giusepperizzo
N ERD-ML
http://github.com/giusepperizzo/nerdml

More Related Content

Viewers also liked

Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010Javier Ruiz
 
15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice location15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice locationJohn Tan Yi Shin
 
Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)douglasgreig
 
act4_fortitude
act4_fortitudeact4_fortitude
act4_fortitudetrince1803
 
5jun n as
5jun n as5jun n as
5jun n asepaper
 
Habits of mind launch
Habits of mind launchHabits of mind launch
Habits of mind launchdouglasgreig
 
Aprendizaje colaborativo
Aprendizaje colaborativoAprendizaje colaborativo
Aprendizaje colaborativolaurafrencia
 
11jun aceh
11jun aceh11jun aceh
11jun acehepaper
 
ICS Overview Cycle 1 6 9 1 10
ICS Overview Cycle 1 6   9 1 10ICS Overview Cycle 1 6   9 1 10
ICS Overview Cycle 1 6 9 1 10SteveLSwanson
 
Edisi Medan
Edisi MedanEdisi Medan
Edisi Medanepaper
 
The Publishers - Ch 9 and 10
The Publishers  - Ch 9 and 10  The Publishers  - Ch 9 and 10
The Publishers - Ch 9 and 10 Jill Falk
 
280909aceh
280909aceh280909aceh
280909acehepaper
 
Edisi 26 Maret Aceh
Edisi 26 Maret AcehEdisi 26 Maret Aceh
Edisi 26 Maret Acehepaper
 
Binder20
Binder20Binder20
Binder20epaper
 
Expo navigation revision
Expo navigation revisionExpo navigation revision
Expo navigation revisionGeoff Adams
 
Edisi 29 Maret Aceh
Edisi 29 Maret AcehEdisi 29 Maret Aceh
Edisi 29 Maret Acehepaper
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatAdam Lowe
 

Viewers also liked (18)

Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010Prã©sentation c arwidi 3 mai 2010
Prã©sentation c arwidi 3 mai 2010
 
15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice location15 sep 11 bt property 2011_makings of a choice location
15 sep 11 bt property 2011_makings of a choice location
 
Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)Savannah Problem Solving (Unit 2 2011)
Savannah Problem Solving (Unit 2 2011)
 
act4_fortitude
act4_fortitudeact4_fortitude
act4_fortitude
 
5jun n as
5jun n as5jun n as
5jun n as
 
Habits of mind launch
Habits of mind launchHabits of mind launch
Habits of mind launch
 
Aprendizaje colaborativo
Aprendizaje colaborativoAprendizaje colaborativo
Aprendizaje colaborativo
 
11jun aceh
11jun aceh11jun aceh
11jun aceh
 
ICS Overview Cycle 1 6 9 1 10
ICS Overview Cycle 1 6   9 1 10ICS Overview Cycle 1 6   9 1 10
ICS Overview Cycle 1 6 9 1 10
 
Edisi Medan
Edisi MedanEdisi Medan
Edisi Medan
 
The Publishers - Ch 9 and 10
The Publishers  - Ch 9 and 10  The Publishers  - Ch 9 and 10
The Publishers - Ch 9 and 10
 
280909aceh
280909aceh280909aceh
280909aceh
 
Edisi 26 Maret Aceh
Edisi 26 Maret AcehEdisi 26 Maret Aceh
Edisi 26 Maret Aceh
 
Binder20
Binder20Binder20
Binder20
 
Expo navigation revision
Expo navigation revisionExpo navigation revision
Expo navigation revision
 
Yoshitaka Fujii - MMR vaccines and autism
Yoshitaka Fujii - MMR vaccines and autismYoshitaka Fujii - MMR vaccines and autism
Yoshitaka Fujii - MMR vaccines and autism
 
Edisi 29 Maret Aceh
Edisi 29 Maret AcehEdisi 29 Maret Aceh
Edisi 29 Maret Aceh
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF Format
 

Similar to Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003butest
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text ProcessingSuneel Marthi
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextDataWorks Summit
 
Fun with Functional Programming in Clojure
Fun with Functional Programming in ClojureFun with Functional Programming in Clojure
Fun with Functional Programming in ClojureCodemotion
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMijp2p
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMijp2p
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmijp2p
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmijp2p
 
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...Codemotion
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachIJECEIAES
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scalasteccami
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksJinho Choi
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationBench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationDongsun Kim
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 

Similar to Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning (20)

Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Santhosh_Resume Current
Santhosh_Resume CurrentSanthosh_Resume Current
Santhosh_Resume Current
 
Fun with Functional Programming in Clojure
Fun with Functional Programming in ClojureFun with Functional Programming in Clojure
Fun with Functional Programming in Clojure
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
 
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHMPBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
PBCBT: AN IMPROVEMENT OF NTBCBT ALGORITHM
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithm
 
Pbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithmPbcbt an improvement of ntbcbt algorithm
Pbcbt an improvement of ntbcbt algorithm
 
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
Fun with Functional Programming in Clojure - John Stevenson - Codemotion Amst...
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Macros in nemerle
Macros in nemerleMacros in nemerle
Macros in nemerle
 
Scc2012 Scala
Scc2012 ScalaScc2012 Scala
Scc2012 Scala
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug LocalizationBench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
Bench4BL: Reproducibility Study on the Performance of IR-Based Bug Localization
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 

More from Giuseppe Rizzo

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social goodGiuseppe Rizzo
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HRGiuseppe Rizzo
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsGiuseppe Rizzo
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your CustomersGiuseppe Rizzo
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized ChatbotGiuseppe Rizzo
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsGiuseppe Rizzo
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1Giuseppe Rizzo
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingGiuseppe Rizzo
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for TouristsGiuseppe Rizzo
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityGiuseppe Rizzo
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summaryGiuseppe Rizzo
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing AlignmentGiuseppe Rizzo
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Giuseppe Rizzo
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksGiuseppe Rizzo
 
Learning with the Web. Structuring data to ease machine understanding
Learning with the Web. Structuring data to ease  machine understandingLearning with the Web. Structuring data to ease  machine understanding
Learning with the Web. Structuring data to ease machine understandingGiuseppe Rizzo
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data CloudGiuseppe Rizzo
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebGiuseppe Rizzo
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataGiuseppe Rizzo
 

More from Giuseppe Rizzo (20)

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social good
 
AI in 60 minutes
AI in 60 minutesAI in 60 minutes
AI in 60 minutes
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational Agents
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your Customers
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized Chatbot
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel Bookings
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for Tourists
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart City
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
 
Learning with the Web. Structuring data to ease machine understanding
Learning with the Web. Structuring data to ease  machine understandingLearning with the Web. Structuring data to ease  machine understanding
Learning with the Web. Structuring data to ease machine understanding
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il Web
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

  • 1. Learning with the Web: SpottingLearning with the Web: Spotting Named Entities on the intersectionNamed Entities on the intersection of NERD and Machine Learningof NERD and Machine Learning Marieke van Erp, Giuseppe Rizzo, Raphaël Troncy @giusepperizzo
  • 2. May 13, 2013 2/13Making Sense of Microposts (#MSM2013) NERD-ML @ MSM'13
  • 3. May 13, 2013 3/13Making Sense of Microposts (#MSM2013) Preprocessing ➢ Dataset is converted in CoNLL IOB format ➢ Applied 10 cross-fold validation ➢ Chunked the set of tweets in 50KB parts in order to comply with NERD filesize limitations
  • 4. May 13, 2013 4/13Making Sense of Microposts (#MSM2013) NERD extractors ➢ Retrieves named entities from 10 extractors (Web APIs) ➢ Harmonizes the classification according to the NERD Ontology v0.5 http://nerd.eurecom.fr/ontology ➢ 75 entity classes mapped to 4 MSM'13 classes http://nerd.eurecom.fr
  • 5. May 13, 2013 5/13Making Sense of Microposts (#MSM2013) Ritter et al. (2011) ➢ Off-the-shelf tool tailored to a Twitter stream based on: – LabelledLDA (+CRF) – Textual features (POS,Capitalization,Suffix, etc.) – Freebase gazetters (names of PER, ORG, LOC) ➢ 10 entity classes mapped to 4 classes Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
  • 6. May 13, 2013 6/13Making Sense of Microposts (#MSM2013) Stanford CRF ➢ Re-trained on the MSM'13 corpora ➢ Parameters based on english.conll.4class.distsim.crf.ser.gz properties file provided with the Stanford distribution ➢ Baseline of our approach Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43nd Annual Meeting of the Association for Computational Linguistics (ACL'05) (2005)
  • 7. May 13, 2013 7/13Making Sense of Microposts (#MSM2013) Textual features ➢ POS ➢ Capitalisation information – initial capital – all capitalized – proportion of token capitals ➢ Prefix (first three letters of the token) ➢ Suffix (last three letters of the token) ➢ Whether token is at the beginning of at the end of the micropost Ritter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An Experimental Study. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
  • 8. May 13, 2013 8/13Making Sense of Microposts (#MSM2013) ML settings Run01: 7 textual features (POS, initial capital, proportion of capitals, prefix, sufix, end/start token); 0 extractor; ML=k-NN, k =1, Euclidean distance Run02: 0 textual feature; 12 extractors (AlchemyAPI, DBpedia Spotlight, Extractiv, Lupedia, OpenCalais, Saplo, Yahoo, Textrazor, Wikimeta, Zemanta, Stanford NER, Ritter et al.); ML=SVM, polynomial kernel, SMO Run03: 4 textual features (POS, initial capital, suffix, Proportion of Capitals); 8 extractors (AlchemyAPI, DBpedia Spotlight, Extractiv, Opencalais, Textrazor, Wikimeta, Stanford NER, Ritter et al.); ML=SVM, polynomial kernel, SMO
  • 9. May 13, 2013 9/13Making Sense of Microposts (#MSM2013) Precision – MSM'13 training, 10 cross-fold validation
  • 10. May 13, 2013 10/13Making Sense of Microposts (#MSM2013) Recall - MSM'13 training, 10 cross-fold validation
  • 11. May 13, 2013 11/13Making Sense of Microposts (#MSM2013) F1 – MSM'13 training, 10 cross-fold validation
  • 12. May 13, 2013 12/13Making Sense of Microposts (#MSM2013) Lessons learned ➢ MISC class is ambiguously defined ➢ 8.1% of the named entities from the training data occurs in the test data ➢ Best Run03: not all extractors and some textual features ➢ For the next challenge what about entity linking?
  • 13. May 13, 2013 13/13Making Sense of Microposts (#MSM2013) Thanks for your time and attention http://www.slideshare.net/giusepperizzo N ERD-ML http://github.com/giusepperizzo/nerdml