The document summarizes a tutorial on integrating natural language processing (NLP) with linked data and the Resource Description Framework (RDF) using the NLP Interchange Format (NIF). It introduces NIF and provides examples of using NLP wrappers and tools like Snowball Stemmer, OpenNLP, and Twinkle to annotate text, query annotated data, and query the Brown Corpus using SPARQL. The goal is to help attendees learn hands-on how to represent NLP objects and relations in RDF using the NIF format.
Scanning the Internet for External Cloud Exposures via SSL Certs
Integrating NLP with Linked Data using NIF format
1. Integrating NLP with Linked Data and RDF:
the NIF format (hands on)
Ciro Baron Neto
Ph.D student at University of Leipzig
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
1
2. Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
2
Overview
• Github NLP2RDF web page overview
and NIF Online demos (Dashboard,
Combinator...)
• Examples
–Example 1: How to annotate string
• using Snowball Steamer and OpenNLP
–Example 2:
• Query generated NIF data and Querying Brown
Corpus
3. NLP2RDF GitHub Website
• https://github.com/NLP2RDF/
• /home/ciro/websites/github/github.com/NLP2RDF/index.html
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
3
6. Example 1: Snowball Stemmer
Wrapper
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
6
7. Snowball Stemmer Wrapper
• Stemming algorithm is a process
for removing suffixes from words.
–CONNECT
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
7
• CONNECTED
• CONNECTION
• CONNECTING
• CONNECTIONS
8. Snowball Stemmer Wrapper
• 1. Open the USB stick folder
• 2. Go to “NIF_tutorial_hands_on_jars” folder
• 3. Open the “instructions.txt” file in a text
editor
• 4. Open a terminal
• 5. Go to the “jar” folder
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
8
9. Snowball Stemmer Wrapper
• Copy the second command of the
instructions.txt
“java -jar snowball.jar -f text -i 'My
favorite actress is Natalie Portman.'“
• -f is used to define the format
• -i is used to define the input
• Paste in the terminal
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
9
12. Snowball Stemmer Wrapper
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
12
NIF Standard Annotations
NIF Offset
13. Snowball Stemmer Wrapper
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
13
NIF Standard Annotations
Snowball Stem
NIF Offset
14. OpenNLP Wrapper
• Back to the terminal and use the first command
of the instructions.txt
java -jar opennlp.jar -f text -i 'My favorite actress is
Natalie Portman.' -modelFolder ../model/
• The -modelFolder parameter set the folder that
contains the POS tagging OpenNLP trained
models and tokenization.
• You might add the parameter “--outfile
myAnnotatedFile.ttl“ to store the triples in a file.
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
14
15. Example 2: Query Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
15
16. Querying with Twinkle
• Open the “/twinkle/example” folder
• Open the NIF_query_example file
in a text editor and copy the query
• Open the “/twinle” folder and run
the command:
java -jar twinkle.jar
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
16
17. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
17
18. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
18
19. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
19
20. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
20
21. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
21
22. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
22
23. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
23
24. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
24
25. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
25
26. Querying Brown Corpus
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
26
27. Exercise 3: Querying your own NIF
annotated string
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
27
28. Querying your own NIF annotated
string
1. Annotate your string using one of the
wrappers
2. Save your annotated sentence to a file
(using “--outfile”)
3. Open Twinkle
4. Query your string using Twinkle
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
28
29. • Query your annotated string:
– nif:Context
– nif:Sentence
– nif:anchorOf
– nif:oliaCategory
– nif:oliaLink
… or practice with Brown Corpus!
Building the Multilingual Web of Data – ISWC
10/20/14 tutorial
29