SlideShare uma empresa Scribd logo
1 de 23
CetgK o l g o t f t lkdD t
           rain n w d e u o I eine a
                   e        nr      a
                    BIS – 2012/ 01 Leipzig – Page 1
                               03/                                        http:/ l
                                                                                /od2.eu




         A Transparent Formalization of
               Text for Machines

                                         http://nlp2rdf.org



Start: Jan 2009
Tentative End: Summer 2012
                                                              Sebastian Hellmann
                                                                A S , U ivr äLipig
                                                                 KW n e it e z
                                                                          st
  L D Pee tt n . 0 .0 .2 1 . P g
   O 2 rsnaio     2 9 00      ae                                        ht:/o 2 u
                                                                         t / d .e
                                                                          p l
BIS – 2012/ 01 Leipzig – Page 2
                             03/                    http:/ l
                                                          /od2.eu




          Overview




Introduction of the touched areas
Scientific Core
Evaluation
Plan
BIS – 2012/ 01 Leipzig – Page 3
               03/                    http:/ l
                                            /od2.eu




The Semantic Gap
BIS – 2012/ 01 Leipzig – Page 4
               03/                                                          http:/ l
                                                                                  /od2.eu




The Semantic Gap
                                      Most problems occurred at the bottom
                                      Data integration is difficult, if the pivots
                                          are not well defined
                                      Questions (in order):
                                            What structure to use?
                                            What URIs to use?
                                            What is a String?
                                            How can we teach machines to
                                              understand Strings
                                              (Knowledge Representation)?
BIS – 2012/ 01 Leipzig – Page 5
                           03/                    http:/ l
                                                        /od2.eu




         Main question




How can we formalize text in a way, which is:
     Transparent for machines
     Efficient for NLP Use Cases
     Consistent with the Web architecture
BIS – 2012/ 01 Leipzig – Page 6
               03/                    http:/ l
                                            /od2.eu




Areas
BIS – 2012/ 01 Leipzig – Page 7
                             03/                                            http:/ l
                                                                                  /od2.eu




           Preliminary definition


The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
    achieve interoperability between Natural Language Processing (NLP) tools,
    language resources and annotations.


     This definition is still limited to RDF and NLP and targets software
        integration via a common exchange format
BIS – 2012/ 01 Leipzig – Page 8
                03/                    http:/ l
                                             /od2.eu




Scientific core
BIS – 2012/ 01 Leipzig – Page 9
                03/                    http:/ l
                                             /od2.eu




Scientific core
BIS – 2012/ 01 Leipzig – Page 10
                03/                              http:/ l
                                                       /od2.eu




Scientific core



                    Intransparent for machines
BIS – 2012/ 01 Leipzig – Page 11
                                 03/                                                   http:/ l
                                                                                             /od2.eu




              Scientific core

     Universe of discourse is defined as the words over the alphabet of Unicode
     characters (Unicode Normal Form C), often called Σ*

            URI
http://example.org/sample                                “The city Berlin is the capital of
       #offset_0_42                                                 Germany.”
BIS – 2012/ 01 Leipzig – Page 12
                                 03/                                                         http:/ l
                                                                                                   /od2.eu




              Scientific core

     Universe of discourse is defined as the words over the alphabet of Unicode
     characters (Unicode Normal Form C), often called Σ*

            URI
http://example.org/sample                      context         “The city Berlin is the capital of
       #offset_0_42                            isString                   Germany.”




     referenceContext




http://example.org/sample                           isString             “Germany”
       #offset_34_41
BIS – 2012/ 01 Leipzig – Page 13
                                 03/                                              http:/ l
                                                                                        /od2.eu




               Scientific core
Define the notion of “Context” and formalize it in OWL:
     Context is similar to the German word “Betrachtungshorizont”
     In English maybe “inside context”, i.e. the text itself, which serves as a
        reference context for all included substrings.
     Definitely disjoint with groupings such as “Document”, because a “wider
       context” is needed for this.




     Example following...
BIS – 2012/ 01 Leipzig – Page 14
                03/                     http:/ l
                                              /od2.eu




Scientific core
BIS – 2012/ 01 Leipzig – Page 15
                                 03/                                              http:/ l
                                                                                        /od2.eu




               Scientific core
Define the notion of “Context” and formalize it in OWL:
     Context is similar to the German word “Betrachtungshorizont”
     In English maybe “inside context”, i.e. the text itself, which serves as a
        reference context for all included substrings.
     Definitely disjoint with groupings such as “Document”, because a “wider
       context” is needed for this.
BIS – 2012/ 01 Leipzig – Page 16
                                03/                                        http:/ l
                                                                                 /od2.eu




              Scientific Core

Goal is to research some of the implications, ...
    but I might not be able to finish it, completely.
In scope:
  Property “contextString” is inverse-functional, which means that machines can
      infer automatically that the same context occurs in different documents.
  Show consistency with ambiguity
  Define metrics that compare contexts
  Formalize the interpretation function
  Show interoperability with internal models of all major NLP frameworks
  (Partial) compatibility with the WWW and the GGG
BIS – 2012/ 01 Leipzig – Page 17
                                03/                                          http:/ l
                                                                                   /od2.eu




                Scientific Core

Out of scope:
  Transition between contexts: Do statements from a smaller context hold in a
      broader context
  Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition
  Fill all the question marks in the Venn diagram
BIS – 2012/ 01 Leipzig – Page 18
               03/                     http:/ l
                                             /od2.eu




Areas
BIS – 2012/ 01 Leipzig – Page 19
                03/                     http:/ l
                                              /od2.eu




Linguistic Linked Open Data Cloud
BIS – 2012/ 01 Leipzig – Page 20
                03/                     http:/ l
                                              /od2.eu




Developers study
BIS – 2012/ 01 Leipzig – Page 21
               03/                     http:/ l
                                             /od2.eu




Areas
BIS – 2012/ 01 Leipzig – Page 22
                             03/                                         http:/ l
                                                                               /od2.eu




           Evaluation


Compare to other models in NLP:
Size (RDF vs. XML) , performance, expressivity
Is NIF easy to understand and implement?
Developers study, release of the specification had quite an impact, people
   started to create extensions and use the format. 50 people on the mailing
   list.
How to evaluate Web Service integration or consistency with web architecture. If
   the way strings are represented is transparent and formalized, do I need to
   do experimental evaluation to show benefits?
BIS – 2012/ 01 Leipzig – Page 23
             03/                             http:/ l
                                                   /od2.eu




Q&A




             Thank you for your attention


       Standing on the shoulders of giants

Mais conteúdo relacionado

Semelhante a Thesis presentation

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportSebastian Hellmann
 
EBCL Presentation LLAS2012-Edinburgh
EBCL Presentation LLAS2012-EdinburghEBCL Presentation LLAS2012-Edinburgh
EBCL Presentation LLAS2012-EdinburghLiang Wang
 
Emblematica overview dlf
Emblematica overview dlfEmblematica overview dlf
Emblematica overview dlfjjett2
 
Semantics at the multimedia fragment level or how enabling the remixing of on...
Semantics at the multimedia fragment level or how enabling the remixing of on...Semantics at the multimedia fragment level or how enabling the remixing of on...
Semantics at the multimedia fragment level or how enabling the remixing of on...Raphael Troncy
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 

Semelhante a Thesis presentation (7)

NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
Steffen Hennicke Presentation DM2E Kick-Off
Steffen Hennicke Presentation DM2E Kick-OffSteffen Hennicke Presentation DM2E Kick-Off
Steffen Hennicke Presentation DM2E Kick-Off
 
EBCL Presentation LLAS2012-Edinburgh
EBCL Presentation LLAS2012-EdinburghEBCL Presentation LLAS2012-Edinburgh
EBCL Presentation LLAS2012-Edinburgh
 
Emblematica overview dlf
Emblematica overview dlfEmblematica overview dlf
Emblematica overview dlf
 
Semantics at the multimedia fragment level or how enabling the remixing of on...
Semantics at the multimedia fragment level or how enabling the remixing of on...Semantics at the multimedia fragment level or how enabling the remixing of on...
Semantics at the multimedia fragment level or how enabling the remixing of on...
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 

Mais de Sebastian Hellmann

Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked DataSebastian Hellmann
 
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web  NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web Sebastian Hellmann
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationSebastian Hellmann
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

Mais de Sebastian Hellmann (14)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked Data
 
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web  NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
NIF - NLP Interchange Format
NIF - NLP Interchange FormatNIF - NLP Interchange Format
NIF - NLP Interchange Format
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Último (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Thesis presentation

  • 1. CetgK o l g o t f t lkdD t rain n w d e u o I eine a e nr a BIS – 2012/ 01 Leipzig – Page 1 03/ http:/ l /od2.eu A Transparent Formalization of Text for Machines http://nlp2rdf.org Start: Jan 2009 Tentative End: Summer 2012 Sebastian Hellmann A S , U ivr äLipig KW n e it e z st L D Pee tt n . 0 .0 .2 1 . P g O 2 rsnaio 2 9 00 ae ht:/o 2 u t / d .e p l
  • 2. BIS – 2012/ 01 Leipzig – Page 2 03/ http:/ l /od2.eu Overview Introduction of the touched areas Scientific Core Evaluation Plan
  • 3. BIS – 2012/ 01 Leipzig – Page 3 03/ http:/ l /od2.eu The Semantic Gap
  • 4. BIS – 2012/ 01 Leipzig – Page 4 03/ http:/ l /od2.eu The Semantic Gap Most problems occurred at the bottom Data integration is difficult, if the pivots are not well defined Questions (in order): What structure to use? What URIs to use? What is a String? How can we teach machines to understand Strings (Knowledge Representation)?
  • 5. BIS – 2012/ 01 Leipzig – Page 5 03/ http:/ l /od2.eu Main question How can we formalize text in a way, which is: Transparent for machines Efficient for NLP Use Cases Consistent with the Web architecture
  • 6. BIS – 2012/ 01 Leipzig – Page 6 03/ http:/ l /od2.eu Areas
  • 7. BIS – 2012/ 01 Leipzig – Page 7 03/ http:/ l /od2.eu Preliminary definition The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. This definition is still limited to RDF and NLP and targets software integration via a common exchange format
  • 8. BIS – 2012/ 01 Leipzig – Page 8 03/ http:/ l /od2.eu Scientific core
  • 9. BIS – 2012/ 01 Leipzig – Page 9 03/ http:/ l /od2.eu Scientific core
  • 10. BIS – 2012/ 01 Leipzig – Page 10 03/ http:/ l /od2.eu Scientific core Intransparent for machines
  • 11. BIS – 2012/ 01 Leipzig – Page 11 03/ http:/ l /od2.eu Scientific core Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called Σ* URI http://example.org/sample “The city Berlin is the capital of #offset_0_42 Germany.”
  • 12. BIS – 2012/ 01 Leipzig – Page 12 03/ http:/ l /od2.eu Scientific core Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called Σ* URI http://example.org/sample context “The city Berlin is the capital of #offset_0_42 isString Germany.” referenceContext http://example.org/sample isString “Germany” #offset_34_41
  • 13. BIS – 2012/ 01 Leipzig – Page 13 03/ http:/ l /od2.eu Scientific core Define the notion of “Context” and formalize it in OWL: Context is similar to the German word “Betrachtungshorizont” In English maybe “inside context”, i.e. the text itself, which serves as a reference context for all included substrings. Definitely disjoint with groupings such as “Document”, because a “wider context” is needed for this. Example following...
  • 14. BIS – 2012/ 01 Leipzig – Page 14 03/ http:/ l /od2.eu Scientific core
  • 15. BIS – 2012/ 01 Leipzig – Page 15 03/ http:/ l /od2.eu Scientific core Define the notion of “Context” and formalize it in OWL: Context is similar to the German word “Betrachtungshorizont” In English maybe “inside context”, i.e. the text itself, which serves as a reference context for all included substrings. Definitely disjoint with groupings such as “Document”, because a “wider context” is needed for this.
  • 16. BIS – 2012/ 01 Leipzig – Page 16 03/ http:/ l /od2.eu Scientific Core Goal is to research some of the implications, ... but I might not be able to finish it, completely. In scope: Property “contextString” is inverse-functional, which means that machines can infer automatically that the same context occurs in different documents. Show consistency with ambiguity Define metrics that compare contexts Formalize the interpretation function Show interoperability with internal models of all major NLP frameworks (Partial) compatibility with the WWW and the GGG
  • 17. BIS – 2012/ 01 Leipzig – Page 17 03/ http:/ l /od2.eu Scientific Core Out of scope: Transition between contexts: Do statements from a smaller context hold in a broader context Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition Fill all the question marks in the Venn diagram
  • 18. BIS – 2012/ 01 Leipzig – Page 18 03/ http:/ l /od2.eu Areas
  • 19. BIS – 2012/ 01 Leipzig – Page 19 03/ http:/ l /od2.eu Linguistic Linked Open Data Cloud
  • 20. BIS – 2012/ 01 Leipzig – Page 20 03/ http:/ l /od2.eu Developers study
  • 21. BIS – 2012/ 01 Leipzig – Page 21 03/ http:/ l /od2.eu Areas
  • 22. BIS – 2012/ 01 Leipzig – Page 22 03/ http:/ l /od2.eu Evaluation Compare to other models in NLP: Size (RDF vs. XML) , performance, expressivity Is NIF easy to understand and implement? Developers study, release of the specification had quite an impact, people started to create extensions and use the format. 50 people on the mailing list. How to evaluate Web Service integration or consistency with web architecture. If the way strings are represented is transparent and formalized, do I need to do experimental evaluation to show benefits?
  • 23. BIS – 2012/ 01 Leipzig – Page 23 03/ http:/ l /od2.eu Q&A Thank you for your attention Standing on the shoulders of giants