Named Entity Recognition Tools Benchmarked and Evaluated

Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>

What is a Named Entity recognition task?
 A task that aims to locate and classify the name of a
person or an organization, a location, a brand, a
product, a numeric expression including time, date,
money and percent in a textual document

12 March 2012 Seminar @ Ecole Centrale, Paris 2/21

History of NER benchmarks
 CoNLL 2003 and CoNLL 2005
 schema (4 types): person, organization, location and miscellaneous
 language independent task

 ACE 2004, ACE 2005 and ACE 2007
 schema (7 types): person, organization, location, facility, weapon,
vehicle and geo-political entity
 entity recognition, not just name (e.g. description, pronoun)
 find relationships among entities extracted

 TAC 2009 (Knowledge Base Track)
 schema (3 types): person, organization and location
 create a knowledge base from the named entities extracted

 ETAPE 2012 (Named Entity Task)
 schema: Quaero (7 main types, 32 sub-types)
12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -3

NER Tools

 Standalone software
 GATE
 Stanford CoreNLP
 Temis

 Web APIs


Factual comparison of 10 Web NER tools
Alchemy DBpe Evri Extr Lup Calais Saplo WM Yahoo Zemanta
dia
Granularity OEN OEN OED OEN OEN OEN OED OEN OEN OED

Language EN EN EN EN EN EN EN EN EN EN
FR GR* IT FR FR SW FR
GR PT* IT SP SP
IT SP*
PT
RU
SP
SW
Quota 30000 unl 3000 3000 unl 50000 1333 unl 5000 10000
(calls/day)

Sample C/C++ Java AS Java N/A Java Java Java JS C#
Clients C# JS Java JS Perl PHP Java
Java PHP PHP PHP JS
Perl Python Perl
PHP5 PHP
Python Python
Ruby Ruby
Content 150KB 452KB 8KB 32KB 20KB 8KB 26KB 80KB 7769KB 970KB
chunk

Alchemy DBpedia Evri Extr Lup Calais Saplo WM Yahoo Zemanta

Response JSON Factual comparison (II)
HTML HTM HTML HTML JSON JSON JSON JSON XML
Format MicroF JSON L JSON JSON MicroF XML XML JSON
XML RDF JSO RDF RDFa RDF
RDF XML N XML XML
RDF
Entity 324 320 5 34 319 95 5 7 13 81
type
number

Entity N/A char N/A word range of char N/A POS range N/A
position offset offset chars offset offset of
chars

Classif. Alchemy DBpedia Evri DBpe DBpedia OpenC N/A ESTER Yahoo FreeBase
Ontologies FreeBase dia LinkedM alais
Scema.org DB

Defer. DBpeda DBpedia Evri DBpe DBpedia OpenC N/A DBpedia Wikipe Wikipedia
Vocabulari FreeBase dia LinkedM alais Geonam dia IMDB
es USCensus DB es MusicBrai
UMBEL CIAFact nz
OpenCyc book Amazon
YAGO Wikicom YouTube
MusicBrainz panies TechCrun
CrunchBase ch
...


Human made benchmarks

 We performed two evaluation experiments:
 WEKEX 2011
 ISWC 2011

t = (entity, type, URI, relevant)

 Each field has been rated by a Boolean value: true if
correct, false otherwise
Rizzo G., Troncy R. (2011), NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data.
In: International Semantic Web Conference 2011 (ISWC'11), Bonn, Germany.


WEKEX 2011 Benchmark

 Controlled experiment
 4 human raters
 10 English news articles (5 from BBC and 5 from The
New York Times)
 Each rater evaluated each article for 5 extractors
200 total evaluations

 Fleiss's kappa score
 moderate agreement among raters

Rizzo G., Troncy R. (2011), NERD: Evaluating Named Entity Recognition Tools in the Web of Data.
In: (ISWC'11) Workshop on Web Scale Knowledge Extraction (WEKEX'11), Bonn, Germany.


Results

different behavior
for different sources


ISWC 2011 Benchmark

 Controlled experiment
 10 human raters
 2 English news articles from The New York Times
 each rater evaluated each article for 6 extractors
120 total evaluations

 Fleiss's kappa score
 substantial
agreement among
raters

12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 10

Results


What is NERD?
ontology1 REST API2
UI3 The NERD ontology has been
integrated in the NIF project,
a EU FP7 in the context of the
LOD2: Creating Knowledge
out of Interlinked Data

1 http://nerd.eurecom.fr/ontology
2 http://nerd.eurecom.fr/api/application.wadl
3 http://nerd.eurecom.fr/


NERD Ontology

 Align the taxonomies used by the extractors


Building the NERD ontology NERD type Occurrence
Person 10
Organization 10
Country 6
Company 6
Location 6
Continent 5
City 5
RadioStation 5
Album 5
Product 5
... ...


NERD REST API

/document
/user GET,
/annotation/{extractor} POST, JSON/RDF*
/extraction PUT,
/evaluation DELETE
“entities” : [{
... “entity”: “Tim Berners-Lee” ,
“type”: “Person” ,
“uri”: "http://dbpedia.org/resource/Tim_berners_lee",
“nerdType”: "http://nerd.eurecom.fr/ontology#Person",
“startChar”: 30,
“endChar”: 45,
“confidence”: 1,
“relevance”: 0.5
}]

Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction
Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.


NIF: NLP Interchange Format Framework

 Different outputs for the NLP tools
OpenCalais DBpedia Spotlight
"_type": "Organization", "@URI": "http://dbpedia.org/resource/DBpedia",
“name": "North Atlantic Treaty Organization", "@types": "DBpedia:Software,DBpedia:Work”
"organizationtype": "governmental civilian", "@surfaceForm": "dbpedia",
"nationality": "N/A", "@offset": "0",
"_typeReference": "@support": "11",
http://s.opencalais.com/1/type/em/e/Organization", "@similarityScore": "0.2387271374464035",
... …

 Manual effort required for integration or reuse
 time consuming
 need to capture the definition of the attributes used in the
response format

 NIF uses RDF for representing NER results as
Linked Data


Named Entities as textual annotations

 Let's consider the document:
http://www.w3.org/DesignIssues/LinkedData.html

The Semantic Web isn't just about putting data on the web. It is about
making links, so that a person or machine can explore the web of data.
With linked data, when you have some of it, you can find other, related,
data.….
All the above plus, Use open standards from W3C (RDF and SPARQL) to
identify things, so that people can point at your stuff
...

entities: {
…
[entity: W3C, startChar: 23107, endChar: 23110],
…
}

NERD meets NIF
Model documents through a
set of strings deferencable on
the Web
: offset_23107_ 23110 a str:String ;
str:referenceContext :offset_0_26546 .

Map string to entity
: offset_23107_ 23110 sso:oen dbpedia:W3C.

Classification

dbpedia:W3C rdf:type nerd:Organization .

Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked
Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France.


NERD Demo


NERD Timeline and Future Work

beginning Comparison of named entity extractors

NERD benchmarks

NERD REST API and NERD ontology

Lift NERD output results to the LOD cloud

today
NERD “smart” service: combining the best of
all NER tools
Dashboard for improving the NERD user
experience


http://nerd.eurecom.fr

@giusepperizzo @rtroncy #nerd

http://www.slideshare.net/giusepperizzo


Named Entity Recognition Tools Benchmarked and Evaluated

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Named Entity Recognition Tools Benchmarked and Evaluated

Semelhante a Named Entity Recognition Tools Benchmarked and Evaluated (20)

Mais de Giuseppe Rizzo

Mais de Giuseppe Rizzo (20)

Último

Último (20)

Named Entity Recognition Tools Benchmarked and Evaluated