O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Evaluating Entity Linking: An Analysis of Current
Benchmark Datasets and a Roadmap for Doing a
Better Job
Marieke van Erp,...
Take home message
• Existing entity linking datasets:
• are not interoperable
• do not cover many different domains
• skew...
Why
• Named entity linking approaches achieve F1 scores of ~.80
on various benchmark datasets
• Are we really testing our ...
This work
• Analysis of 7 entity linking benchmark datasets
• Dataset characteristics (document type, domain, license etc)...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 (5,596)
NEEL...
Datasets
Dataset Type Domain Doc length Format Encoding License
AIDA-YAGO2 news general medium TSV ASCII Agreement
2014/20...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Confusability
• The number of meanings a surface form (mention) can have
11
Confusability
Corpus Average Min Max
AIDA-YAGO2 1.08 1 13 0.37
2014 NEEL 1.02 1 3 0.16
2015 NEEL 1.05 1 4 0.25
OKE2015 1.1...
Dominance
Corpus Dominance Min Max
AIDA-YAGO2 .98 1 452 0.08
2014 NEEL .99 1 47 0.06
2015 NEEL .98 1 88 0.09
OKE2015 .98 1...
Entity Types
https://github.com/dbpedia-spotlight/evaluation-datasets/ 14
Entity Types
15
Entity Prominance
https://github.com/dbpedia-spotlight/evaluation-datasets/ 16
DBpedia PageRank datasets:
http://people.ai...
How can we do better?
• Document your dataset!
• Use a standardised format
• Diversify both in domains and in entity distr...
Work in Progress & Future work
• Analyse more datasets
• Evaluate the temporal dimension of datasets (current work
by Fili...
Want to help?
Scripts and data used here can be found at:
Contact marieke.van.erp@vu.nl if you want to collaborate
https:/...
Shameless Advertising
NLP&DBpedia 2016
Workshop at ISWC2016
Submission deadline: 1 July
https://nlpdbpedia2016.wordpress.c...
Acknowledgements
https://github.com/dbpedia-spotlight/evaluation-datasets/
Próximos SlideShares
Carregando em…5
×

Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis
Presented at LREC 2016:
http://www.lrec-conf.org/proceedings/lrec2016/pdf/926_Paper.pdf

  • Entre para ver os comentários

Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

  1. 1. Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis https://github.com/dbpedia-spotlight/evaluation-datasets
  2. 2. Take home message • Existing entity linking datasets: • are not interoperable • do not cover many different domains • skew towards popular and frequent entities • We need to: • Document & Standardise • Diversify to cover different domains and the long tail https://github.com/dbpedia-spotlight/evaluation-datasets/ 2
  3. 3. Why • Named entity linking approaches achieve F1 scores of ~.80 on various benchmark datasets • Are we really testing our approaches on all aspects of the entity linking task? It’s not just us: Maud Ehrmann and Damien Nouvel and Sophie Rosset. Named Entity Resources - Overview and Outlook. LREC 2016 https://github.com/dbpedia-spotlight/evaluation-datasets/ 3
  4. 4. This work • Analysis of 7 entity linking benchmark datasets • Dataset characteristics (document type, domain, license etc) • Entity, surface form & mention characterisation (overlap between datasets, confusability, prominence, dominance, types, etc) • Annotation characteristics (nested entities, redundancy, IAA, offsets) + Roadmap: how can we do better https://github.com/dbpedia-spotlight/evaluation-datasets/ 4
  5. 5. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 (5,596) NEEL2014 (2,380) NEEL2015 (2,800) OKE2015 (531) RSS500 (849) WES2015 (7,309) Wikinews (279) https://github.com/dbpedia-spotlight/evaluation-datasets/ 5
  6. 6. Datasets Dataset Type Domain Doc length Format Encoding License AIDA-YAGO2 news general medium TSV ASCII Agreement 2014/2015 NEEL tweets general short TSV ASCII Open OKE2015 encyclopaedia general long NIF/RDF UTF8 Open RSS-500 news general medium NIF/RDF UTF8 Open WES2015 blog science long NIF/RDF UTF8 Open WikiNews news general medium XML UTF8 Open https://github.com/dbpedia-spotlight/evaluation-datasets/ 6
  7. 7. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 7
  8. 8. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 8
  9. 9. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 9
  10. 10. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 10
  11. 11. Confusability • The number of meanings a surface form (mention) can have 11
  12. 12. Confusability Corpus Average Min Max AIDA-YAGO2 1.08 1 13 0.37 2014 NEEL 1.02 1 3 0.16 2015 NEEL 1.05 1 4 0.25 OKE2015 1.11 1 25 1.22 RSS500 1.02 1 3 0.16 WES2015 1.06 1 6 0.30 Wikinews 1.09 1 29 1.03 https://github.com/dbpedia-spotlight/evaluation-datasets/ 12
  13. 13. Dominance Corpus Dominance Min Max AIDA-YAGO2 .98 1 452 0.08 2014 NEEL .99 1 47 0.06 2015 NEEL .98 1 88 0.09 OKE2015 .98 1 1 0.11 RSS500 .99 1 1 0.07 WES2015 .97 1 1 0.12 Wikinews .99 1 72 0.09 https://github.com/dbpedia-spotlight/evaluation-datasets/ 13
  14. 14. Entity Types https://github.com/dbpedia-spotlight/evaluation-datasets/ 14
  15. 15. Entity Types 15
  16. 16. Entity Prominance https://github.com/dbpedia-spotlight/evaluation-datasets/ 16 DBpedia PageRank datasets: http://people.aifb.kit.edu/ath/
  17. 17. How can we do better? • Document your dataset! • Use a standardised format • Diversify both in domains and in entity distribution https://github.com/dbpedia-spotlight/evaluation-datasets/ 17
  18. 18. Work in Progress & Future work • Analyse more datasets • Evaluate the temporal dimension of datasets (current work by Filip Ilievski & Marten Postma) • Integrate and build better datasets https://github.com/dbpedia-spotlight/evaluation-datasets/ 18
  19. 19. Want to help? Scripts and data used here can be found at: Contact marieke.van.erp@vu.nl if you want to collaborate https://github.com/dbpedia-spotlight/evaluation-datasets/ 19
  20. 20. Shameless Advertising NLP&DBpedia 2016 Workshop at ISWC2016 Submission deadline: 1 July https://nlpdbpedia2016.wordpress.com/ 20
  21. 21. Acknowledgements https://github.com/dbpedia-spotlight/evaluation-datasets/

×