Separation of Lanthanides/ Lanthanides and Actinides
Thesis presentation
1. CetgK o l g o t f t lkdD t
rain n w d e u o I eine a
e nr a
BIS – 2012/ 01 Leipzig – Page 1
03/ http:/ l
/od2.eu
A Transparent Formalization of
Text for Machines
http://nlp2rdf.org
Start: Jan 2009
Tentative End: Summer 2012
Sebastian Hellmann
A S , U ivr äLipig
KW n e it e z
st
L D Pee tt n . 0 .0 .2 1 . P g
O 2 rsnaio 2 9 00 ae ht:/o 2 u
t / d .e
p l
2. BIS – 2012/ 01 Leipzig – Page 2
03/ http:/ l
/od2.eu
Overview
Introduction of the touched areas
Scientific Core
Evaluation
Plan
3. BIS – 2012/ 01 Leipzig – Page 3
03/ http:/ l
/od2.eu
The Semantic Gap
4. BIS – 2012/ 01 Leipzig – Page 4
03/ http:/ l
/od2.eu
The Semantic Gap
Most problems occurred at the bottom
Data integration is difficult, if the pivots
are not well defined
Questions (in order):
What structure to use?
What URIs to use?
What is a String?
How can we teach machines to
understand Strings
(Knowledge Representation)?
5. BIS – 2012/ 01 Leipzig – Page 5
03/ http:/ l
/od2.eu
Main question
How can we formalize text in a way, which is:
Transparent for machines
Efficient for NLP Use Cases
Consistent with the Web architecture
6. BIS – 2012/ 01 Leipzig – Page 6
03/ http:/ l
/od2.eu
Areas
7. BIS – 2012/ 01 Leipzig – Page 7
03/ http:/ l
/od2.eu
Preliminary definition
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
This definition is still limited to RDF and NLP and targets software
integration via a common exchange format
8. BIS – 2012/ 01 Leipzig – Page 8
03/ http:/ l
/od2.eu
Scientific core
9. BIS – 2012/ 01 Leipzig – Page 9
03/ http:/ l
/od2.eu
Scientific core
10. BIS – 2012/ 01 Leipzig – Page 10
03/ http:/ l
/od2.eu
Scientific core
Intransparent for machines
11. BIS – 2012/ 01 Leipzig – Page 11
03/ http:/ l
/od2.eu
Scientific core
Universe of discourse is defined as the words over the alphabet of Unicode
characters (Unicode Normal Form C), often called Σ*
URI
http://example.org/sample “The city Berlin is the capital of
#offset_0_42 Germany.”
12. BIS – 2012/ 01 Leipzig – Page 12
03/ http:/ l
/od2.eu
Scientific core
Universe of discourse is defined as the words over the alphabet of Unicode
characters (Unicode Normal Form C), often called Σ*
URI
http://example.org/sample context “The city Berlin is the capital of
#offset_0_42 isString Germany.”
referenceContext
http://example.org/sample isString “Germany”
#offset_34_41
13. BIS – 2012/ 01 Leipzig – Page 13
03/ http:/ l
/od2.eu
Scientific core
Define the notion of “Context” and formalize it in OWL:
Context is similar to the German word “Betrachtungshorizont”
In English maybe “inside context”, i.e. the text itself, which serves as a
reference context for all included substrings.
Definitely disjoint with groupings such as “Document”, because a “wider
context” is needed for this.
Example following...
14. BIS – 2012/ 01 Leipzig – Page 14
03/ http:/ l
/od2.eu
Scientific core
15. BIS – 2012/ 01 Leipzig – Page 15
03/ http:/ l
/od2.eu
Scientific core
Define the notion of “Context” and formalize it in OWL:
Context is similar to the German word “Betrachtungshorizont”
In English maybe “inside context”, i.e. the text itself, which serves as a
reference context for all included substrings.
Definitely disjoint with groupings such as “Document”, because a “wider
context” is needed for this.
16. BIS – 2012/ 01 Leipzig – Page 16
03/ http:/ l
/od2.eu
Scientific Core
Goal is to research some of the implications, ...
but I might not be able to finish it, completely.
In scope:
Property “contextString” is inverse-functional, which means that machines can
infer automatically that the same context occurs in different documents.
Show consistency with ambiguity
Define metrics that compare contexts
Formalize the interpretation function
Show interoperability with internal models of all major NLP frameworks
(Partial) compatibility with the WWW and the GGG
17. BIS – 2012/ 01 Leipzig – Page 17
03/ http:/ l
/od2.eu
Scientific Core
Out of scope:
Transition between contexts: Do statements from a smaller context hold in a
broader context
Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition
Fill all the question marks in the Venn diagram
18. BIS – 2012/ 01 Leipzig – Page 18
03/ http:/ l
/od2.eu
Areas
19. BIS – 2012/ 01 Leipzig – Page 19
03/ http:/ l
/od2.eu
Linguistic Linked Open Data Cloud
20. BIS – 2012/ 01 Leipzig – Page 20
03/ http:/ l
/od2.eu
Developers study
21. BIS – 2012/ 01 Leipzig – Page 21
03/ http:/ l
/od2.eu
Areas
22. BIS – 2012/ 01 Leipzig – Page 22
03/ http:/ l
/od2.eu
Evaluation
Compare to other models in NLP:
Size (RDF vs. XML) , performance, expressivity
Is NIF easy to understand and implement?
Developers study, release of the specification had quite an impact, people
started to create extensions and use the format. 50 people on the mailing
list.
How to evaluate Web Service integration or consistency with web architecture. If
the way strings are represented is transparent and formalized, do I need to
do experimental evaluation to show benefits?
23. BIS – 2012/ 01 Leipzig – Page 23
03/ http:/ l
/od2.eu
Q&A
Thank you for your attention
Standing on the shoulders of giants