3. Semantic Augmentation
• From:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
• To:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
http://dbpedia.org/Ontology/New_York_City
http://dbpedia.org/Ontology/Apple_Corps
3
4. Semantic Augmentation
• Semantic augmentation is a process
of attaching semantics to a selected
part of a text to assist automatic
interpretation of the meaning
conveyed by the text.
• Also called semantic annotation,
semantic tagging
4
6. Why Semantic Augmentation?
• Links to complementary information
– “More about this”
• Show related or similar informatiom
• Reasoning and inferencing offered by
semantics
• Semantic annotation is the glue that ties
ontologies into document spaces –
remember existing web is document web
• Manual metadata production cost is too
high 6
7. GATE for Semantic
Augmentation
• GATE (General Architecture for Text
Engineering) – see gate.ac.uk
• GATE Developer is a development
environment that provides a rich set of
graphical interactive tools for the creation,
measurement and maintenance of software
components for processing human
language.
• See: http://gate.ac.uk/family/developer.html
7
8. Overview of Gate Developer
• GATE Developer
• Resources Pane
– applications: groups of processes to run on a
document or corpus
– language resources: corpus, ontologies, schemas
– processing resources: tools that operate on
unstructured text
– datastores: saved documents and resources
• Display Pane: whatever you’re currently working
with.
• See next slide
10. Processing Resources: ANNIE
• A family of Processing Resources for
language analysis included with GATE
• Stands for A Nearly-New Information
Extraction system.
• Using finite state techniques to implement
various tasks: tokenization, semantic
tagging, verb phrase chunking, and so on.
12. Some ANNIE Components
• Tokenizer
– word, number, symbol, punctuation, and spaceToken.
• Sentence Splitter
– Segments text into sentences
• Part of Speech Tagger
– produces a part-of-speech tag as an annotation on each word or
symbol – Nouns, verbs etc.
• Gate Morphological Analyser
– detecting morphemes in a piece of text (e.g. car,
caring)
• OntoGazetteer
– Semantic Tagging component – uses ontology
13. Demo:
• From:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
• To:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
http://dbpedia.org/Ontology/New_York_City
http://dbpedia.org/Ontology/Apple_Corps
13
13
14. Step : Download & Start the
GATE application
• Download GATE from:
http://gate.ac.uk/download/
• Note: the demonstration is using GATE 6.0
14
15. Step: From Language Resources
Select
• GATE document-> Make sure that String
content is selected in the last field, see
screenshot below. Name the file “Test”
15
16. Paste following text…in the file
• Upon their return, Lennon and McCartney
went to New York to announce the
formation of Apple Corps.
16
17. Step: From Processing resources
select following resources
• ANNIE English Tokeniser
• ANNIE Sentence Splitter
• ANNIE POS Tagger
• GATE Morphological Analyser
• Note: For all the above, leave the “Name”
field Empty
17
19. Step: From Language Resources
Select
• OWLIM Ontology
– Specify the location of the ontology you would
like to use for semantic augmentation
– For example, we are using dbpedia ontology
19
22. Final steps: Create Corpus
• Go to Language resources and click on GATE Corpus, and
add “Test” document created earlier
22
23. Final steps: Create Corpus
Pipeline
• From application
• And add processing resources in order shown below and
press “run this application”
23
24. Results: Go to file, Click on Annotation
Set, Annotation List, Lookup
Semantic Augmentation
24
25. Other features
• JAPE
– a Java Annotation Patterns Engine, provides
regular-expression based pattern/action rules over
annotations.
– Grammar to detect entities, validate detected
entities, pre & post processing
– Example: “at the Carnegie Stadium”, “at the
Emirates Stadium”, “at the O2 Arena”
– See Tutorial: http://gate.ac.uk/sale/thakker-jape-
tutorial/index.html
26. Some Links
• Home page is http://gate.ac.uk/
• Some good short tutorial videos for getting started:
http://gate.ac.uk/demos/developer-videos/ . These are only
a few minutes each, so they’re fast
• User Guide: http://gate.ac.uk/sale/tao/index.html . This is
apparently for version 7.1, which is a development build,
but again it seems to be fine.
• Lots of documentation :
http://gate.ac.uk/documentation.html
• The wiki: http://gate.ac.uk/wiki/
• JAPE grammar by Dhaval Thakker et al
http://gate.ac.uk/sale/thakker-jape-
tutorial/index.html
27. Challenge: Term Ambiguity
• ...this apple on the palm of my hand...
• ...Apple tried to acquire Palm Inc....
• ...eating an apple sitted by a palm tree...
• What do “apple” and “palm” mean in each case?
• Objective is to recognize entities and disambiguate
their meaning.
DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,
and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
27
31. DBpedia Spotlight
• DBpedia is a collection of entity descriptions
extracted from Wikipedia & shared as linked data
• DBpedia Spotlight uses data from DBpedia and text
from associated Wikipedia pages
• Learns how to recognize that a DBpedia resource
was mentioned
• Given plain text as input, generates annotated text
http://dbpedia-spotlight.github.com/demo/
31
34. References
• DBpedia Spotlight: Shedding Light on the Web of
Documents. Pablo Mendes, Max Jakob, Andrés
García-Silva, and Christian Bizer. In: In the
Proceedings of the 7th International Conference on
Semantic Systems I-Semantics (2011) .
• Introduction to GATE, Dr. Paula Matuszek
• Various resources from gate.ac.uk
34
Notas do Editor
It is just not tagging
Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.