What Are The Drone Anti-jamming Systems Technology?
NIF 2.0 Phd thesis intermediate report
1. BIS – 2013/04/15 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
PhD thesis intermediate report
NLP Interchange Format (NIF) 2.0
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
DISCLAIMER:
this presentation is work in progress, example RDF is outdated
2. BIS – 2013/04/15 – Page 2 http://lod2.eu
NLP Interchange Format 2.0
3. BIS – 2013/04/15 – Page 3 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• NIF 2.0 will be published in 6-8 weeks
• Highly probable to become the de-facto standard for modelling RDF tool
output in the NLP domain
NLP Interchange Format 2.0
4. BIS – 2013/04/15 – Page 4 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
5. BIS – 2013/04/15 – Page 5 http://lod2.eu
Introduction
Components have pre- and postconditions
auto configuration theoretical possible, but in reality a lot of manual work
Huge potential to save time and money at the interfaces
6. BIS – 2013/04/15 – Page 6 http://lod2.eu
Core problems:
1. Too much heterogeneity
2. Almost no standards available
3. No open collaboration
4. Difficult and large domain
Problem analysis
7. BIS – 2013/04/15 – Page 7 http://lod2.eu
Technical heterogeneity
• Technologies: XML, Relational Databases, CSV, DOC, PDF
• Similar to other domains
• Formats: Negra, CoNLL, GrAF, Paula, CAS (UIMA), Penn
• Virtually each tool has implemented readers for the 5-6 formats + its
own serialization
• Programming languages: Java, Python, ...
• Java has predominance
Problem analysis
8. BIS – 2013/04/15 – Page 8 http://lod2.eu
Domain heterogeneity
• Multilingualism
• Over 100 part of speech tags (several for each language)
• No open mappings exist
• About 20 different tasks listed on:
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
• Natural language is a difficult topic:
• The roulette dealer siad: “Rien ne va plus!”
– 8 words, 4 French, 4 English, one spelling mistake, impossible to
decide the language of the whole.
• Ban on Nude Dancing on Governor's Desk
Problem analysis
10. BIS – 2013/04/15 – Page 10 http://lod2.eu
Open collaboration
• LAF/GrAF is a recently released ISO standard
• But it is not open (60 Euros to view the document)
• Not in RDF (the main requirements for any Semantic Web tool)
• Large frameworks tend to only be “inward” compatible
• UIMA advocates say: “Why don't you just use UIMA?”
• Gate advocates: “Integrate it into GATE!”
• Generally, a large time investment and lock-in
Problem analysis
11. BIS – 2013/04/15 – Page 11 http://lod2.eu
Summary:
Hardly any reusability
• Free software (as in free beer), but no open licenses
• No standards and no mappings
• Integration is hard-wired (you have to write software)
Problem analysis
12. BIS – 2013/04/15 – Page 12 http://lod2.eu
• Definition for text normalization + URI Schemes (give URIs to Strings)
• NIF Core Ontology: default vocabulary for most often used annotations
• Predefined modules for most use cases
• Infrastructure
• for open collaboration / discussion
• persistent hosting
• validation and demo services
• Reference implementation
• Data conversion
NIF Overview
13. BIS – 2013/04/15 – Page 13 http://lod2.eu
Text Normalization + URI Schemes
14. BIS – 2013/04/15 – Page 14 http://lod2.eu
Text Normalization + URI Schemes
NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
NIF 2.0 uses RFC 5147 as base form:
http://www.w3.org/DesignIssues/LinkedData.html#char=717,729
User extensions possible:
http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme
(but you have to link to documentation on how it was created)
15. BIS – 2013/04/15 – Page 15 http://lod2.eu
As a Web Service
curl
--data-urlencode prefix="http://prefix.given.by/theClient#"
--data-urlencode input="[...]"
(--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”)
http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
The new namespace is http://persistence.uni-leipzig.org/nlp2rdf/nif-core#
16. BIS – 2013/04/15 – Page 16 http://lod2.eu
Ontologies:
• NIF Core Ontology (URI Scheme, String, Context, but also Token, Sentence,
lemma, stem, etc. ) for often used annotations.
• Simple Error Ontology to describe errors (fatal, message, timestamp)
• Vocabulary Modules for each purpose or ontology or project
Overview of Ontologies
17. BIS – 2013/04/15 – Page 17 http://lod2.eu
Each ontology consists of three sets of axioms:
- Terminology model (definitions)
- Inference model (especially transitivity)
- Validation model (consistency)
1) nif-core.ttl
2) nif-core-inf.ttl imports 1
3) nif-core-val.ttl imports 1 and 2
Logical Modularity
18. BIS – 2013/04/15 – Page 18 http://lod2.eu
NIF simple:
• Only one truth
• Easy to understand and to query
• Least amount of triples
NIF + Stanbol (Apache Project)
• Several ranked alternatives
• Provenance of annotations
• In collaboration with Apache Stanbol
Open Annotation (W3C group)
• Rich model
• Not only text, but everything (images)
Granularity Modularity
- More triples
- more complexity
- worse usability
- lossless
conversion
Well-defined conversions
between the different levels
- easier queries
- higher performance
- lossful conversion
19. BIS – 2013/04/15 – Page 19 http://lod2.eu
Strucural Interoperability:
- URI schemes provide normalization
- RDF provide graph data model
- OWL provides the logical model
Conceptual Interoperability
- NIF Core Ontology and mapping to most often used annotations, e.g. lemma,
stems
- Vocabulary Module to include other terminologies and ontologies
Interoperability
20. BIS – 2013/04/15 – Page 20 http://lod2.eu
• ITS 2.0
• FISE used in Apache Stanbol (IKS-EU Project)
• LAF/GrAF XML – ISO standard, recently published
• Fragment Identifiers by IETF and W3C
• Lemon ontology from Monnet EU Project
• NERD ontology from EURECOM and LinkedTV EU Project
• Xpointer/XPath URI scheme
• Open Annotation
• ISOCat
NIF 2.0 tries to be compatible to (Vocabulary Module)
21. BIS – 2013/04/15 – Page 21 http://lod2.eu
• Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst
• Russian TreeTagger :
http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform
• German STTS: http://purl.org/olia/stts.owl#VAPP
• English Penn: http://purl.org/olia/penn.owl#VBG
→ all map to http://purl.org/olia/olia.owl#NonFiniteVerb
Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free
and open, CC-By)
Vocabulary Module: OLiA
22. BIS – 2013/04/15 – Page 22 http://lod2.eu
NIF can be extended by Vocabulary Modules
OliA
http://purl.org/olia
Conceptual Interoperability
23. BIS – 2013/04/15 – Page 23 http://lod2.eu
• Java-Maven implementation
• PHP implementation
• Reference implementations: DBpedia Spotlight, Stanford Parser, Korean POS
tagger, Keyword Search
• Wiki: http://wiki.nlp2rdf.org
• Validators
• Code generators (convert vocabulary modules to code stubs)
• NIF is free and open (CC-0 / CC-BY / Apache)
• All ontologies will be hosted persistently by University Leipzig
•http://persistence.uni-leipzig.org/nlp2rdf/
NIF 2.0 Infrastructure for adoption
24. BIS – 2013/04/15 – Page 24 http://lod2.eu
• Huge collection of use cases
• e.g. Ali wants to exchange different NLP service for RDFace
• LOD2 from Wolters Kluwer
• A selection will be implemented. Assumption:
• NIF is good, if it fulfills many use cases
Evaluation 1
25. BIS – 2013/04/15 – Page 25 http://lod2.eu
• There are about 10 to 20 third party implementations
Evaluation 2
26. BIS – 2013/04/15 – Page 26 http://lod2.eu
Analysis of existing frameworks and formats. Criteria:
• Convertability (Adequacy)
• Do the graph models match?
• Coverage
• Quantitative analysis of used annotations
• Does NIF Core provide terms for the most common annotations, are
there any gaps?
Evaluation 3
29. BIS – 2013/04/15 – Page 29 http://lod2.eu
Data Conversion
Data is available as
free, open, interoperable (FOI) language resources at
http://linguistics.okfn.org/resources/llod/
(work in progress)
30. BIS – 2013/04/15 – Page 30 http://lod2.eu
Project has a very good impact:
• Many adopters
• Industrial uptake
• Inclusion in a W3C standard for ITS 2.0:
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
• Several projects involved as stakeholders (LOD2, Monnet, ...)
• Several motivated open-source developers
• Funding is coming in
Critical judgement
31. BIS – 2013/04/15 – Page 31 http://lod2.eu
Scientific merit ?
• provides scientific infrastructure
• Easier to write and combine software
• Free, open, interoperable (FOI) language resources
• Free, open NLP test benchmarks (Future work)
• What part is scientific and what part is community work and negotiation?
• No progress in state of the art in NLP methods, yet
• Difficult to judge were to put the emphasis on. Lot of “soft evaluation”
topics, no key performance indicators(KPI) .
Critical judgement
32. BIS – 2013/04/15 – Page 32 http://lod2.eu
• 2011: Open Knowledge Conference
• 2012: Workshop and book “Linked Data in Linguistics”
• 2012: Linked Data Cup @ I-Semantics
• 2012: Web of Linked Entities @ ISWC
• 2012: MLODE@ Sabre
• 2013: Semantic Web Journal: Special Issue on Multilingual Linked Open Data
(MLOD)
• Future work: DBpedia & NLP @ ISWC 2013
Conference + Workshops + Proceedings
33. BIS – 2013/04/15 – Page 33 http://lod2.eu
Thanks for your attention