SlideShare uma empresa Scribd logo
1 de 33
BIS – 2013/04/15 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
PhD thesis intermediate report
NLP Interchange Format (NIF) 2.0
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
DISCLAIMER:
this presentation is work in progress, example RDF is outdated
BIS – 2013/04/15 – Page 2 http://lod2.eu
NLP Interchange Format 2.0
BIS – 2013/04/15 – Page 3 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• NIF 2.0 will be published in 6-8 weeks
• Highly probable to become the de-facto standard for modelling RDF tool
output in the NLP domain
NLP Interchange Format 2.0
BIS – 2013/04/15 – Page 4 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
BIS – 2013/04/15 – Page 5 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
Huge potential to save time and money at the interfaces
BIS – 2013/04/15 – Page 6 http://lod2.eu
Core problems:
1. Too much heterogeneity
2. Almost no standards available
3. No open collaboration
4. Difficult and large domain
Problem analysis
BIS – 2013/04/15 – Page 7 http://lod2.eu
Technical heterogeneity
• Technologies: XML, Relational Databases, CSV, DOC, PDF
• Similar to other domains
• Formats: Negra, CoNLL, GrAF, Paula, CAS (UIMA), Penn
• Virtually each tool has implemented readers for the 5-6 formats + its
own serialization
• Programming languages: Java, Python, ...
• Java has predominance
Problem analysis
BIS – 2013/04/15 – Page 8 http://lod2.eu
Domain heterogeneity
• Multilingualism
• Over 100 part of speech tags (several for each language)
• No open mappings exist
• About 20 different tasks listed on:
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
• Natural language is a difficult topic:
• The roulette dealer siad: “Rien ne va plus!”
– 8 words, 4 French, 4 English, one spelling mistake, impossible to
decide the language of the whole.
• Ban on Nude Dancing on Governor's Desk
Problem analysis
BIS – 2013/04/15 – Page 9 http://lod2.eu
Problem analysis
BIS – 2013/04/15 – Page 10 http://lod2.eu
Open collaboration
• LAF/GrAF is a recently released ISO standard
• But it is not open (60 Euros to view the document)
• Not in RDF (the main requirements for any Semantic Web tool)
• Large frameworks tend to only be “inward” compatible
• UIMA advocates say: “Why don't you just use UIMA?”
• Gate advocates: “Integrate it into GATE!”
• Generally, a large time investment and lock-in
Problem analysis
BIS – 2013/04/15 – Page 11 http://lod2.eu
Summary:
Hardly any reusability
• Free software (as in free beer), but no open licenses
• No standards and no mappings
• Integration is hard-wired (you have to write software)
Problem analysis
BIS – 2013/04/15 – Page 12 http://lod2.eu
• Definition for text normalization + URI Schemes (give URIs to Strings)
• NIF Core Ontology: default vocabulary for most often used annotations
• Predefined modules for most use cases
• Infrastructure
• for open collaboration / discussion
• persistent hosting
• validation and demo services
• Reference implementation
• Data conversion
NIF Overview
BIS – 2013/04/15 – Page 13 http://lod2.eu
Text Normalization + URI Schemes
BIS – 2013/04/15 – Page 14 http://lod2.eu
Text Normalization + URI Schemes
NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
NIF 2.0 uses RFC 5147 as base form:
http://www.w3.org/DesignIssues/LinkedData.html#char=717,729
User extensions possible:
http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme
(but you have to link to documentation on how it was created)
BIS – 2013/04/15 – Page 15 http://lod2.eu
As a Web Service
curl
--data-urlencode prefix="http://prefix.given.by/theClient#"
--data-urlencode input="[...]"
(--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”)
http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
The new namespace is http://persistence.uni-leipzig.org/nlp2rdf/nif-core#
BIS – 2013/04/15 – Page 16 http://lod2.eu
Ontologies:
• NIF Core Ontology (URI Scheme, String, Context, but also Token, Sentence,
lemma, stem, etc. ) for often used annotations.
• Simple Error Ontology to describe errors (fatal, message, timestamp)
• Vocabulary Modules for each purpose or ontology or project
Overview of Ontologies
BIS – 2013/04/15 – Page 17 http://lod2.eu
Each ontology consists of three sets of axioms:
- Terminology model (definitions)
- Inference model (especially transitivity)
- Validation model (consistency)
1) nif-core.ttl
2) nif-core-inf.ttl imports 1
3) nif-core-val.ttl imports 1 and 2
Logical Modularity
BIS – 2013/04/15 – Page 18 http://lod2.eu
NIF simple:
• Only one truth
• Easy to understand and to query
• Least amount of triples
NIF + Stanbol (Apache Project)
• Several ranked alternatives
• Provenance of annotations
• In collaboration with Apache Stanbol
Open Annotation (W3C group)
• Rich model
• Not only text, but everything (images)
Granularity Modularity
- More triples
- more complexity
- worse usability
- lossless
conversion
Well-defined conversions
between the different levels
- easier queries
- higher performance
- lossful conversion
BIS – 2013/04/15 – Page 19 http://lod2.eu
Strucural Interoperability:
- URI schemes provide normalization
- RDF provide graph data model
- OWL provides the logical model
Conceptual Interoperability
- NIF Core Ontology and mapping to most often used annotations, e.g. lemma,
stems
- Vocabulary Module to include other terminologies and ontologies
Interoperability
BIS – 2013/04/15 – Page 20 http://lod2.eu
• ITS 2.0
• FISE used in Apache Stanbol (IKS-EU Project)
• LAF/GrAF XML – ISO standard, recently published
• Fragment Identifiers by IETF and W3C
• Lemon ontology from Monnet EU Project
• NERD ontology from EURECOM and LinkedTV EU Project
• Xpointer/XPath URI scheme
• Open Annotation
• ISOCat
NIF 2.0 tries to be compatible to (Vocabulary Module)
BIS – 2013/04/15 – Page 21 http://lod2.eu
• Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst
• Russian TreeTagger :
http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform
• German STTS: http://purl.org/olia/stts.owl#VAPP
• English Penn: http://purl.org/olia/penn.owl#VBG
→ all map to http://purl.org/olia/olia.owl#NonFiniteVerb
Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free
and open, CC-By)
Vocabulary Module: OLiA
BIS – 2013/04/15 – Page 22 http://lod2.eu
NIF can be extended by Vocabulary Modules
OliA
http://purl.org/olia
Conceptual Interoperability
BIS – 2013/04/15 – Page 23 http://lod2.eu
• Java-Maven implementation
• PHP implementation
• Reference implementations: DBpedia Spotlight, Stanford Parser, Korean POS
tagger, Keyword Search
• Wiki: http://wiki.nlp2rdf.org
• Validators
• Code generators (convert vocabulary modules to code stubs)
• NIF is free and open (CC-0 / CC-BY / Apache)
• All ontologies will be hosted persistently by University Leipzig
•http://persistence.uni-leipzig.org/nlp2rdf/
NIF 2.0 Infrastructure for adoption
BIS – 2013/04/15 – Page 24 http://lod2.eu
• Huge collection of use cases
• e.g. Ali wants to exchange different NLP service for RDFace
• LOD2 from Wolters Kluwer
• A selection will be implemented. Assumption:
• NIF is good, if it fulfills many use cases
Evaluation 1
BIS – 2013/04/15 – Page 25 http://lod2.eu
• There are about 10 to 20 third party implementations
Evaluation 2
BIS – 2013/04/15 – Page 26 http://lod2.eu
Analysis of existing frameworks and formats. Criteria:
• Convertability (Adequacy)
• Do the graph models match?
• Coverage
• Quantitative analysis of used annotations
• Does NIF Core provide terms for the most common annotations, are
there any gaps?
Evaluation 3
BIS – 2013/04/15 – Page 27 http://lod2.eu
Data Conversion
BIS – 2013/04/15 – Page 28 http://lod2.eu
Data Conversion
BIS – 2013/04/15 – Page 29 http://lod2.eu
Data Conversion
Data is available as
free, open, interoperable (FOI) language resources at
http://linguistics.okfn.org/resources/llod/
(work in progress)
BIS – 2013/04/15 – Page 30 http://lod2.eu
Project has a very good impact:
• Many adopters
• Industrial uptake
• Inclusion in a W3C standard for ITS 2.0:
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
• Several projects involved as stakeholders (LOD2, Monnet, ...)
• Several motivated open-source developers
• Funding is coming in
Critical judgement
BIS – 2013/04/15 – Page 31 http://lod2.eu
Scientific merit ?
• provides scientific infrastructure
• Easier to write and combine software
• Free, open, interoperable (FOI) language resources
• Free, open NLP test benchmarks (Future work)
• What part is scientific and what part is community work and negotiation?
• No progress in state of the art in NLP methods, yet
• Difficult to judge were to put the emphasis on. Lot of “soft evaluation”
topics, no key performance indicators(KPI) .
Critical judgement
BIS – 2013/04/15 – Page 32 http://lod2.eu
• 2011: Open Knowledge Conference
• 2012: Workshop and book “Linked Data in Linguistics”
• 2012: Linked Data Cup @ I-Semantics
• 2012: Web of Linked Entities @ ISWC
• 2012: MLODE@ Sabre
• 2013: Semantic Web Journal: Special Issue on Multilingual Linked Open Data
(MLOD)
• Future work: DBpedia & NLP @ ISWC 2013
Conference + Workshops + Proceedings
BIS – 2013/04/15 – Page 33 http://lod2.eu
Thanks for your attention

Mais conteúdo relacionado

Mais procurados (10)

LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 

Destaque (6)

PhD Progress, July 5th 2012
PhD Progress, July 5th 2012PhD Progress, July 5th 2012
PhD Progress, July 5th 2012
 
Progression Points
Progression Points Progression Points
Progression Points
 
2nd year PHD Report
2nd year PHD Report2nd year PHD Report
2nd year PHD Report
 
My thesis progress presentation
My thesis progress presentationMy thesis progress presentation
My thesis progress presentation
 
1 Year PhD Presentation
1 Year PhD Presentation1 Year PhD Presentation
1 Year PhD Presentation
 
PhD Annual Report first page & detailed table of contents
PhD Annual Report first page & detailed table of contentsPhD Annual Report first page & detailed table of contents
PhD Annual Report first page & detailed table of contents
 

Semelhante a NIF 2.0 Phd thesis intermediate report

Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked DataSebastian Hellmann
 
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web  NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web Sebastian Hellmann
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Sergio Fernández
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711STIinnsbruck
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesLibriotech
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015FREMEProjectH2020
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.Matthias Arnold
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationSebastian Hellmann
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Kai Eckert
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationJulien PLU
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121Jamie Clark
 
Semantic web-and-public-data - en
Semantic web-and-public-data - enSemantic web-and-public-data - en
Semantic web-and-public-data - enTenforce
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 

Semelhante a NIF 2.0 Phd thesis intermediate report (20)

Integrating NLP using Linked Data
Integrating NLP using Linked DataIntegrating NLP using Linked Data
Integrating NLP using Linked Data
 
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web  NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian libraries
 
VRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, ArnoldVRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, Arnold
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting Information
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121Briefing on OASIS XLIFF OMOS TC 20160121
Briefing on OASIS XLIFF OMOS TC 20160121
 
Semantic web-and-public-data - en
Semantic web-and-public-data - enSemantic web-and-public-data - en
Semantic web-and-public-data - en
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 

Mais de Sebastian Hellmann

Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

Mais de Sebastian Hellmann (12)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

NIF 2.0 Phd thesis intermediate report

  • 1. BIS – 2013/04/15 – Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann PhD thesis intermediate report NLP Interchange Format (NIF) 2.0 http://nlp2rdf.org http://lod2.eu http://slideshare.net/kurzum DISCLAIMER: this presentation is work in progress, example RDF is outdated
  • 2. BIS – 2013/04/15 – Page 2 http://lod2.eu NLP Interchange Format 2.0
  • 3. BIS – 2013/04/15 – Page 3 http://lod2.eu The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • NIF 2.0 will be published in 6-8 weeks • Highly probable to become the de-facto standard for modelling RDF tool output in the NLP domain NLP Interchange Format 2.0
  • 4. BIS – 2013/04/15 – Page 4 http://lod2.eu Introduction Components have pre- and postconditions auto configuration theoretical possible, but in reality a lot of manual work
  • 5. BIS – 2013/04/15 – Page 5 http://lod2.eu Introduction Components have pre- and postconditions auto configuration theoretical possible, but in reality a lot of manual work Huge potential to save time and money at the interfaces
  • 6. BIS – 2013/04/15 – Page 6 http://lod2.eu Core problems: 1. Too much heterogeneity 2. Almost no standards available 3. No open collaboration 4. Difficult and large domain Problem analysis
  • 7. BIS – 2013/04/15 – Page 7 http://lod2.eu Technical heterogeneity • Technologies: XML, Relational Databases, CSV, DOC, PDF • Similar to other domains • Formats: Negra, CoNLL, GrAF, Paula, CAS (UIMA), Penn • Virtually each tool has implemented readers for the 5-6 formats + its own serialization • Programming languages: Java, Python, ... • Java has predominance Problem analysis
  • 8. BIS – 2013/04/15 – Page 8 http://lod2.eu Domain heterogeneity • Multilingualism • Over 100 part of speech tags (several for each language) • No open mappings exist • About 20 different tasks listed on: http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP • Natural language is a difficult topic: • The roulette dealer siad: “Rien ne va plus!” – 8 words, 4 French, 4 English, one spelling mistake, impossible to decide the language of the whole. • Ban on Nude Dancing on Governor's Desk Problem analysis
  • 9. BIS – 2013/04/15 – Page 9 http://lod2.eu Problem analysis
  • 10. BIS – 2013/04/15 – Page 10 http://lod2.eu Open collaboration • LAF/GrAF is a recently released ISO standard • But it is not open (60 Euros to view the document) • Not in RDF (the main requirements for any Semantic Web tool) • Large frameworks tend to only be “inward” compatible • UIMA advocates say: “Why don't you just use UIMA?” • Gate advocates: “Integrate it into GATE!” • Generally, a large time investment and lock-in Problem analysis
  • 11. BIS – 2013/04/15 – Page 11 http://lod2.eu Summary: Hardly any reusability • Free software (as in free beer), but no open licenses • No standards and no mappings • Integration is hard-wired (you have to write software) Problem analysis
  • 12. BIS – 2013/04/15 – Page 12 http://lod2.eu • Definition for text normalization + URI Schemes (give URIs to Strings) • NIF Core Ontology: default vocabulary for most often used annotations • Predefined modules for most use cases • Infrastructure • for open collaboration / discussion • persistent hosting • validation and demo services • Reference implementation • Data conversion NIF Overview
  • 13. BIS – 2013/04/15 – Page 13 http://lod2.eu Text Normalization + URI Schemes
  • 14. BIS – 2013/04/15 – Page 14 http://lod2.eu Text Normalization + URI Schemes NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 NIF 2.0 uses RFC 5147 as base form: http://www.w3.org/DesignIssues/LinkedData.html#char=717,729 User extensions possible: http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme (but you have to link to documentation on how it was created)
  • 15. BIS – 2013/04/15 – Page 15 http://lod2.eu As a Web Service curl --data-urlencode prefix="http://prefix.given.by/theClient#" --data-urlencode input="[...]" (--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”) http://nlp2rdf.lod2.eu/demo/NIFStanfordCore The new namespace is http://persistence.uni-leipzig.org/nlp2rdf/nif-core#
  • 16. BIS – 2013/04/15 – Page 16 http://lod2.eu Ontologies: • NIF Core Ontology (URI Scheme, String, Context, but also Token, Sentence, lemma, stem, etc. ) for often used annotations. • Simple Error Ontology to describe errors (fatal, message, timestamp) • Vocabulary Modules for each purpose or ontology or project Overview of Ontologies
  • 17. BIS – 2013/04/15 – Page 17 http://lod2.eu Each ontology consists of three sets of axioms: - Terminology model (definitions) - Inference model (especially transitivity) - Validation model (consistency) 1) nif-core.ttl 2) nif-core-inf.ttl imports 1 3) nif-core-val.ttl imports 1 and 2 Logical Modularity
  • 18. BIS – 2013/04/15 – Page 18 http://lod2.eu NIF simple: • Only one truth • Easy to understand and to query • Least amount of triples NIF + Stanbol (Apache Project) • Several ranked alternatives • Provenance of annotations • In collaboration with Apache Stanbol Open Annotation (W3C group) • Rich model • Not only text, but everything (images) Granularity Modularity - More triples - more complexity - worse usability - lossless conversion Well-defined conversions between the different levels - easier queries - higher performance - lossful conversion
  • 19. BIS – 2013/04/15 – Page 19 http://lod2.eu Strucural Interoperability: - URI schemes provide normalization - RDF provide graph data model - OWL provides the logical model Conceptual Interoperability - NIF Core Ontology and mapping to most often used annotations, e.g. lemma, stems - Vocabulary Module to include other terminologies and ontologies Interoperability
  • 20. BIS – 2013/04/15 – Page 20 http://lod2.eu • ITS 2.0 • FISE used in Apache Stanbol (IKS-EU Project) • LAF/GrAF XML – ISO standard, recently published • Fragment Identifiers by IETF and W3C • Lemon ontology from Monnet EU Project • NERD ontology from EURECOM and LinkedTV EU Project • Xpointer/XPath URI scheme • Open Annotation • ISOCat NIF 2.0 tries to be compatible to (Vocabulary Module)
  • 21. BIS – 2013/04/15 – Page 21 http://lod2.eu • Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst • Russian TreeTagger : http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform • German STTS: http://purl.org/olia/stts.owl#VAPP • English Penn: http://purl.org/olia/penn.owl#VBG → all map to http://purl.org/olia/olia.owl#NonFiniteVerb Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free and open, CC-By) Vocabulary Module: OLiA
  • 22. BIS – 2013/04/15 – Page 22 http://lod2.eu NIF can be extended by Vocabulary Modules OliA http://purl.org/olia Conceptual Interoperability
  • 23. BIS – 2013/04/15 – Page 23 http://lod2.eu • Java-Maven implementation • PHP implementation • Reference implementations: DBpedia Spotlight, Stanford Parser, Korean POS tagger, Keyword Search • Wiki: http://wiki.nlp2rdf.org • Validators • Code generators (convert vocabulary modules to code stubs) • NIF is free and open (CC-0 / CC-BY / Apache) • All ontologies will be hosted persistently by University Leipzig •http://persistence.uni-leipzig.org/nlp2rdf/ NIF 2.0 Infrastructure for adoption
  • 24. BIS – 2013/04/15 – Page 24 http://lod2.eu • Huge collection of use cases • e.g. Ali wants to exchange different NLP service for RDFace • LOD2 from Wolters Kluwer • A selection will be implemented. Assumption: • NIF is good, if it fulfills many use cases Evaluation 1
  • 25. BIS – 2013/04/15 – Page 25 http://lod2.eu • There are about 10 to 20 third party implementations Evaluation 2
  • 26. BIS – 2013/04/15 – Page 26 http://lod2.eu Analysis of existing frameworks and formats. Criteria: • Convertability (Adequacy) • Do the graph models match? • Coverage • Quantitative analysis of used annotations • Does NIF Core provide terms for the most common annotations, are there any gaps? Evaluation 3
  • 27. BIS – 2013/04/15 – Page 27 http://lod2.eu Data Conversion
  • 28. BIS – 2013/04/15 – Page 28 http://lod2.eu Data Conversion
  • 29. BIS – 2013/04/15 – Page 29 http://lod2.eu Data Conversion Data is available as free, open, interoperable (FOI) language resources at http://linguistics.okfn.org/resources/llod/ (work in progress)
  • 30. BIS – 2013/04/15 – Page 30 http://lod2.eu Project has a very good impact: • Many adopters • Industrial uptake • Inclusion in a W3C standard for ITS 2.0: http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html • Several projects involved as stakeholders (LOD2, Monnet, ...) • Several motivated open-source developers • Funding is coming in Critical judgement
  • 31. BIS – 2013/04/15 – Page 31 http://lod2.eu Scientific merit ? • provides scientific infrastructure • Easier to write and combine software • Free, open, interoperable (FOI) language resources • Free, open NLP test benchmarks (Future work) • What part is scientific and what part is community work and negotiation? • No progress in state of the art in NLP methods, yet • Difficult to judge were to put the emphasis on. Lot of “soft evaluation” topics, no key performance indicators(KPI) . Critical judgement
  • 32. BIS – 2013/04/15 – Page 32 http://lod2.eu • 2011: Open Knowledge Conference • 2012: Workshop and book “Linked Data in Linguistics” • 2012: Linked Data Cup @ I-Semantics • 2012: Web of Linked Entities @ ISWC • 2012: MLODE@ Sabre • 2013: Semantic Web Journal: Special Issue on Multilingual Linked Open Data (MLOD) • Future work: DBpedia & NLP @ ISWC 2013 Conference + Workshops + Proceedings
  • 33. BIS – 2013/04/15 – Page 33 http://lod2.eu Thanks for your attention