SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Multilingual Named Entity Recognition
           using Wikipedia
    Laboratory for Knowledge Discovery in Databases
   Department of Computing and Information Sciences
                 Kansas State University
     http://www.kddresearch.org/tikiwiki/tiki-index.php




              Presenter: Svitlana O. Volkova
                 Instructor: William Hsu
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   Synonymy Discovery with Google Sets
IV.    Experiment Design
V.     Conclusions
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   GoogleSets for Synonymy Discovery
IV.    Experiment
V.     Conclusions
PROJECT MILESTONES

Input: Crawler Functionality
CRAWLING WIKIPEDIA
Output: Set of Multilingual Gazetteers


      Input: Initial Gazetteer in one Language
      RELATIONSHIP DISCOVERY WITH GOOGLESETS
      Output: Extended Gazetteer with Synonyms


             Input: Extended Gazetteer with Synonyms + Content
             MULTILINGUAL NER TASK
             Output: Extracted Entities from the Content
KEY IDEA - WIKIPEDIA
 Apply Wikipedia knowledge representation for
  multilingual information extraction
             English Wiki Concepts of Interest
      …, anthrax, bovine virus, …, camelpox, surra, …




             17http://wiki.digitalmethods.net/Dmi/WikipediaAnalysis



           Russian Wiki Concepts of Interest
 …, Зоонозы, Классическая чума свиней, Лептоспироз, …
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   GoogleSets for Synonymy Discovery
IV.    Experiment
V.     Conclusions
CRAWLING WIKIPEDIA



Multilingual NER
(article + category
 +interwiki links)


                      Wiki Category Graph and Article Graph
GAZETTEERS EXAMPLES IN DIFFERENT
           LANGUAGES
GAZETTEERS SIZE IN DIFFERENT
                 LANGUAGES


                            19

               37                                                English
                                                      86         Japanese
                                                                 German
                       20                                        Russian




Decision: dictionaries are too small, so wee need to find a way how to
                             extend it!!!
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   GoogleSets for Synonymy Discovery
IV.    Experiment
V.     Conclusions
GAZETTEERS EXAMPLES:
GERMAN GOOGLE SETS OUTPUT
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   GoogleSets for Synonymy Discovery
IV.    Experiment
V.     Conclusions
EXPERIMENT SET UP
 Purpose: to perform named entity recognition task in
  specific domain and report accuracy of extraction using
  a) Wiki knowledge
  b) Extended lists with synonyms from Google Sets


 Hypothesis: the synonyms extraction phase is essential
  for increasing accuracy of information extraction task
DISEASE EXTRACTOR MODULE
                 INPUT AND OUTPUT
                                             Output:
                                             Index of the first character

                         Disease             Index of the last character
                        Extractor            Length of the matched text
           Input: Text Module
              from file                      Matched Text
                                             Canonical disease name
Disease ExtractionTask
  The task of disease recognition can be considered as NER/information
    extraction (IE) task
  The main purpose is to retrieve tokens that much at least one term with
    synonyms, abbreviations from list of the animal disease names
CONTEXT EXAMPLES IN DIFFERENT LANGUAGES
DUTCH
    Leptospirose komt voor in alle landen, behalve het Noordpoolgebied. De incidentie is hoog.
      Meer dan de helft van de gevallen voordoet in ernstige en vereiste reanimatie.
CZECH
    Leptospiróza se vyskytuje ve všech zemích s výjimkou Arktidy. Incidence je vysoká. Více než
      polovina případů se vyskytuje v těžké a vyžaduje resuscitaci.
GERMAN
    Leptospirose tritt in allen Ländern, mit Ausnahme der Arktis. Die Inzidenz ist hoch. Mehr als
      die Hälfte der Fälle tritt in schweren und Reanimation erforderlich.
ITALIAN
     Leptospirosi si verifica in tutti i paesi, tranne l'Artico. L'incidenza è alta. Più della metà dei
      casi si verifica in rianimazione grave e richiesti.
URKAINIAN
     Лептоспіроз відбувається в усіх країнах, за винятком Арктики. Захворюваність висока.
      Більше половини випадків відбувається в суворих і необхідність реанімації.
RUSSIAN
     Лептоспироз происходит во всех странах, за исключением Арктики. Заболеваемость
      высокая. Более половины случаев происходит в суровых и необходимости реанимации.
DISEASE EXTRACTOR MODULE DEMO
http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/
RESULTS FOR DISEASE EXTRACTOR MODULE

       INPUT A                OUTPUT A
Foot and mouth disease is
one of the most contagious
diseases of cloven-hooved
mammals…

       INPUT B                OUTPUT B
Rift Valley Fever | CDC
Special Pathogens Branch
Mission Statement Disease …
AGENDA

I.     Project Overview
II.    Crawling Wikipedia
III.   GoogleSets for Synonymy Discovery
IV.    Experiment
V.     Conclusions
CONCLUSIONS
 ApplyingWikipedia knowledge for multilingual NERTask


 Phase 1: CrawlingWiki – completed
 Phase 2: Google Sets Expansion – completed
 Phase 3: Multilingual Disease Extraction – in progress


 Novelty: Overcome Wiki limitations by applying Google Sets
  expansion approach

 In order to estimate accuracy we need to have annotated data in
  different languages
REFERENCES
   Torsten Zesch and Iryna Gurevych, Analysis of the Wikipedia Category Graph for NLP
    Applications, In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), p.
    1--8,               April             2007.          http://elara.tk.informatik.tu-
    darmstadt.de/publications/2007/hlt-textgraphs.pdf

   Watanabe, Yotaro and Asahara, Masayuki and Matsumoto, Yuji, A Graph-Based
    Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields,
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural
    Language Processing and Computational Natural Language Learning (EMNLP-
    CoNLL), 649-657. http://www.aclweb.org/anthology/D/D07/D07-1068

   Manning, C., & Schutze, H. Foundations of statistical natural language processing.
    Cambridge, MA: MIT Press, 1999.
ACKNOWLEDGEMENTS

 Dr. William Hsu for meaningful guidance




 John Drouhard for building extraction architecture




 Landon Fowles for expanding gazetteers using Google Sets

Mais conteúdo relacionado

Semelhante a Multilingual Ner Using Wiki

Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
Joint ALMA Observatory
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...
ijseajournal
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
Maryam Farooq
 
CSCW in Times of Social Media
CSCW in Times of Social MediaCSCW in Times of Social Media
CSCW in Times of Social Media
Hendrik Drachsler
 

Semelhante a Multilingual Ner Using Wiki (20)

ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
Wikiomics
WikiomicsWikiomics
Wikiomics
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Wikis at work
Wikis at workWikis at work
Wikis at work
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
 
A Collaborative Framework for Managing and Publishing KOS
A Collaborative  Framework for  Managing and Publishing KOS A Collaborative  Framework for  Managing and Publishing KOS
A Collaborative Framework for Managing and Publishing KOS
 
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open dataAGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
AGROVOC: FAO’s multilingual thesaurus as a building block for linked open data
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
CSCW in Times of Social Media
CSCW in Times of Social MediaCSCW in Times of Social Media
CSCW in Times of Social Media
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...
Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...
Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...
 

Mais de Svitlana volkova (18)

EACL'12 Poster
EACL'12 PosterEACL'12 Poster
EACL'12 Poster
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
Web Intelligence 2010
Web Intelligence 2010Web Intelligence 2010
Web Intelligence 2010
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
MS Thesis Short
MS Thesis ShortMS Thesis Short
MS Thesis Short
 
IEEE ISI'10
IEEE ISI'10IEEE ISI'10
IEEE ISI'10
 
MedEx'10
MedEx'10MedEx'10
MedEx'10
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Social Networks
Social NetworksSocial Networks
Social Networks
 
Methods Of Reliability Analysis
Methods Of Reliability AnalysisMethods Of Reliability Analysis
Methods Of Reliability Analysis
 
Ohio Project
Ohio ProjectOhio Project
Ohio Project
 
Ukraine Presentation
Ukraine PresentationUkraine Presentation
Ukraine Presentation
 
Ukraine Presentation at Kansas State University
Ukraine Presentation at Kansas State UniversityUkraine Presentation at Kansas State University
Ukraine Presentation at Kansas State University
 
Communicatons Fulbright
Communicatons FulbrightCommunicatons Fulbright
Communicatons Fulbright
 
Communications Ternopil
Communications TernopilCommunications Ternopil
Communications Ternopil
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 

Último (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 

Multilingual Ner Using Wiki

  • 1. Multilingual Named Entity Recognition using Wikipedia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org/tikiwiki/tiki-index.php Presenter: Svitlana O. Volkova Instructor: William Hsu
  • 2. AGENDA I. Project Overview II. Crawling Wikipedia III. Synonymy Discovery with Google Sets IV. Experiment Design V. Conclusions
  • 3. AGENDA I. Project Overview II. Crawling Wikipedia III. GoogleSets for Synonymy Discovery IV. Experiment V. Conclusions
  • 4. PROJECT MILESTONES Input: Crawler Functionality CRAWLING WIKIPEDIA Output: Set of Multilingual Gazetteers Input: Initial Gazetteer in one Language RELATIONSHIP DISCOVERY WITH GOOGLESETS Output: Extended Gazetteer with Synonyms Input: Extended Gazetteer with Synonyms + Content MULTILINGUAL NER TASK Output: Extracted Entities from the Content
  • 5. KEY IDEA - WIKIPEDIA  Apply Wikipedia knowledge representation for multilingual information extraction English Wiki Concepts of Interest …, anthrax, bovine virus, …, camelpox, surra, … 17http://wiki.digitalmethods.net/Dmi/WikipediaAnalysis Russian Wiki Concepts of Interest …, Зоонозы, Классическая чума свиней, Лептоспироз, …
  • 6. AGENDA I. Project Overview II. Crawling Wikipedia III. GoogleSets for Synonymy Discovery IV. Experiment V. Conclusions
  • 7. CRAWLING WIKIPEDIA Multilingual NER (article + category +interwiki links) Wiki Category Graph and Article Graph
  • 8. GAZETTEERS EXAMPLES IN DIFFERENT LANGUAGES
  • 9. GAZETTEERS SIZE IN DIFFERENT LANGUAGES 19 37 English 86 Japanese German 20 Russian Decision: dictionaries are too small, so wee need to find a way how to extend it!!!
  • 10. AGENDA I. Project Overview II. Crawling Wikipedia III. GoogleSets for Synonymy Discovery IV. Experiment V. Conclusions
  • 12. AGENDA I. Project Overview II. Crawling Wikipedia III. GoogleSets for Synonymy Discovery IV. Experiment V. Conclusions
  • 13. EXPERIMENT SET UP  Purpose: to perform named entity recognition task in specific domain and report accuracy of extraction using a) Wiki knowledge b) Extended lists with synonyms from Google Sets  Hypothesis: the synonyms extraction phase is essential for increasing accuracy of information extraction task
  • 14. DISEASE EXTRACTOR MODULE INPUT AND OUTPUT Output: Index of the first character Disease Index of the last character Extractor Length of the matched text Input: Text Module from file Matched Text Canonical disease name Disease ExtractionTask  The task of disease recognition can be considered as NER/information extraction (IE) task  The main purpose is to retrieve tokens that much at least one term with synonyms, abbreviations from list of the animal disease names
  • 15. CONTEXT EXAMPLES IN DIFFERENT LANGUAGES DUTCH  Leptospirose komt voor in alle landen, behalve het Noordpoolgebied. De incidentie is hoog. Meer dan de helft van de gevallen voordoet in ernstige en vereiste reanimatie. CZECH  Leptospiróza se vyskytuje ve všech zemích s výjimkou Arktidy. Incidence je vysoká. Více než polovina případů se vyskytuje v těžké a vyžaduje resuscitaci. GERMAN  Leptospirose tritt in allen Ländern, mit Ausnahme der Arktis. Die Inzidenz ist hoch. Mehr als die Hälfte der Fälle tritt in schweren und Reanimation erforderlich. ITALIAN  Leptospirosi si verifica in tutti i paesi, tranne l'Artico. L'incidenza è alta. Più della metà dei casi si verifica in rianimazione grave e richiesti. URKAINIAN  Лептоспіроз відбувається в усіх країнах, за винятком Арктики. Захворюваність висока. Більше половини випадків відбувається в суворих і необхідність реанімації. RUSSIAN  Лептоспироз происходит во всех странах, за исключением Арктики. Заболеваемость высокая. Более половины случаев происходит в суровых и необходимости реанимации.
  • 16. DISEASE EXTRACTOR MODULE DEMO http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/
  • 17.
  • 18. RESULTS FOR DISEASE EXTRACTOR MODULE INPUT A OUTPUT A Foot and mouth disease is one of the most contagious diseases of cloven-hooved mammals… INPUT B OUTPUT B Rift Valley Fever | CDC Special Pathogens Branch Mission Statement Disease …
  • 19. AGENDA I. Project Overview II. Crawling Wikipedia III. GoogleSets for Synonymy Discovery IV. Experiment V. Conclusions
  • 20. CONCLUSIONS  ApplyingWikipedia knowledge for multilingual NERTask  Phase 1: CrawlingWiki – completed  Phase 2: Google Sets Expansion – completed  Phase 3: Multilingual Disease Extraction – in progress  Novelty: Overcome Wiki limitations by applying Google Sets expansion approach  In order to estimate accuracy we need to have annotated data in different languages
  • 21. REFERENCES  Torsten Zesch and Iryna Gurevych, Analysis of the Wikipedia Category Graph for NLP Applications, In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), p. 1--8, April 2007. http://elara.tk.informatik.tu- darmstadt.de/publications/2007/hlt-textgraphs.pdf  Watanabe, Yotaro and Asahara, Masayuki and Matsumoto, Yuji, A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP- CoNLL), 649-657. http://www.aclweb.org/anthology/D/D07/D07-1068  Manning, C., & Schutze, H. Foundations of statistical natural language processing. Cambridge, MA: MIT Press, 1999.
  • 22. ACKNOWLEDGEMENTS  Dr. William Hsu for meaningful guidance  John Drouhard for building extraction architecture  Landon Fowles for expanding gazetteers using Google Sets