SlideShare uma empresa Scribd logo
1 de 34
COMP3725
Knowledge Enriched Information
           Systems


 Lecture 13: Semantic Augmentation

       Dhavalkumar Thakker (Dhaval)
   School of Computing, University of Leeds

                                              1
Outline

• Semantic Augmentation
  – What
  – Why
  – How
• Existing systems & services for Semantic
  Augmentation
• Challenges



                                             2
Semantic Augmentation

• From:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.


• To:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.
                         http://dbpedia.org/Ontology/New_York_City
  http://dbpedia.org/Ontology/Apple_Corps
                                                                     3
Semantic Augmentation

• Semantic augmentation is a process
  of attaching semantics to a selected
  part of a text to assist automatic
  interpretation of the meaning
  conveyed by the text.

• Also called semantic annotation,
  semantic tagging
                                    4
It provides additional information
   about an existing piece of data.




                                  5
Why Semantic Augmentation?

• Links to complementary information
  – “More about this”
• Show related or similar informatiom
• Reasoning and inferencing offered by
  semantics
• Semantic annotation is the glue that ties
  ontologies into document spaces –
  remember existing web is document web
• Manual metadata production cost is too
  high                                        6
GATE for Semantic
             Augmentation
• GATE (General Architecture for Text
  Engineering) – see gate.ac.uk
• GATE Developer is a development
  environment that provides a rich set of
  graphical interactive tools for the creation,
  measurement and maintenance of software
  components       for    processing    human
  language.
• See: http://gate.ac.uk/family/developer.html
                                              7
Overview of Gate Developer

• GATE Developer
• Resources Pane
   – applications: groups of processes to run on a
     document or corpus
   – language resources: corpus, ontologies, schemas
   – processing resources: tools that operate on
     unstructured text
   – datastores: saved documents and resources
• Display Pane: whatever you’re currently working
  with.
• See next slide
GATE : Interface




Resources
Pane                  Display
                      Pane




                                9
Processing Resources: ANNIE

• A family of Processing Resources for
  language analysis included with GATE
• Stands for A Nearly-New Information
  Extraction system.
• Using finite state techniques to implement
  various tasks: tokenization, semantic
  tagging, verb phrase chunking, and so on.
ANNIE IE Modules




     http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
Some ANNIE Components

• Tokenizer
   – word, number, symbol, punctuation, and spaceToken.
• Sentence Splitter
   – Segments text into sentences
• Part of Speech Tagger
    – produces a part-of-speech tag as an annotation on each word or
      symbol – Nouns, verbs etc.
• Gate Morphological Analyser
   – detecting morphemes in a piece of text (e.g. car,
     caring)
• OntoGazetteer
   – Semantic Tagging component – uses ontology
Demo:

• From:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.


• To:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.
                         http://dbpedia.org/Ontology/New_York_City
  http://dbpedia.org/Ontology/Apple_Corps
                                                                     13
                                                                       13
Step : Download & Start the
        GATE application
• Download GATE from:
  http://gate.ac.uk/download/
• Note: the demonstration is using GATE 6.0




                                          14
Step: From Language Resources
            Select
• GATE document-> Make sure that String
  content is selected in the last field, see
  screenshot below. Name the file “Test”




                                               15
Paste following text…in the file

• Upon their return, Lennon and McCartney
  went to New York to announce the
  formation of Apple Corps.




                                            16
Step: From Processing resources
   select following resources
•   ANNIE English Tokeniser
•   ANNIE Sentence Splitter
•   ANNIE POS Tagger
•   GATE Morphological Analyser
•   Note: For all the above, leave the “Name”
    field Empty


                                                17
Step: From Processing resources
   select following resources




                              18
Step: From Language Resources
            Select
• OWLIM Ontology
  – Specify the location of the ontology you would
    like to use for semantic augmentation
  – For example, we are using dbpedia ontology




                                                 19
OWLIM Ontology window




                        20
From Processing Resources
             Select
• Select Onto Root Gazetteer
• & specify parameters as follows:




                                     21
Final steps: Create Corpus

• Go to Language resources and click on GATE Corpus, and
  add “Test” document created earlier




                                                       22
Final steps: Create Corpus
               Pipeline
• From application




• And add processing resources in order shown below and
  press “run this application”




                                                          23
Results: Go to file, Click on Annotation
       Set, Annotation List, Lookup




Semantic Augmentation



                                          24
Other features

• JAPE
  – a Java Annotation Patterns Engine, provides
    regular-expression based pattern/action rules over
    annotations.
  – Grammar to detect entities, validate detected
    entities, pre & post processing
  – Example: “at the Carnegie Stadium”, “at the
    Emirates Stadium”, “at the O2 Arena”
  – See Tutorial: http://gate.ac.uk/sale/thakker-jape-
    tutorial/index.html
Some Links
• Home page is http://gate.ac.uk/
• Some good short tutorial videos for getting started:
  http://gate.ac.uk/demos/developer-videos/ . These are only
  a few minutes each, so they’re fast
• User Guide: http://gate.ac.uk/sale/tao/index.html . This is
  apparently for version 7.1, which is a development build,
  but again it seems to be fine.
• Lots of documentation :
  http://gate.ac.uk/documentation.html
• The wiki: http://gate.ac.uk/wiki/
• JAPE grammar by Dhaval Thakker et al
  http://gate.ac.uk/sale/thakker-jape-
  tutorial/index.html
Challenge: Term Ambiguity

• ...this apple on the palm of my hand...
• ...Apple tried to acquire Palm Inc....
• ...eating an apple sitted by a palm tree...

• What do “apple” and “palm” mean in each case?

• Objective is to recognize entities and disambiguate
  their meaning.
  DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,
  and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
                                                                                                                       27
Challenges

•   Disambiguation
•   Unknown entities
•   Ontology learning
•   Scale and speed
•   Co-referencing
Existing Services for Semantic
Augmentation
Existing Services for Semantic
Augmentation
DBpedia Spotlight
• DBpedia is a collection of entity descriptions
  extracted from Wikipedia & shared as linked data

• DBpedia Spotlight uses data from DBpedia and text
  from associated Wikipedia pages

• Learns how to recognize that a DBpedia resource
  was mentioned

• Given plain text as input, generates annotated text
   http://dbpedia-spotlight.github.com/demo/
                                                        31
DBpedia Spotlight




                    32
DBpedia Spotlight




                    33
References

• DBpedia Spotlight: Shedding Light on the Web of
  Documents. Pablo Mendes, Max Jakob, Andrés
  García-Silva, and Christian Bizer. In: In the
  Proceedings of the 7th International Conference on
  Semantic Systems I-Semantics (2011) .
• Introduction to GATE, Dr. Paula Matuszek
• Various resources from gate.ac.uk



                                                  34

Mais conteúdo relacionado

Mais procurados

Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...Julie Allinson
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Mathieu d'Aquin
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaCharalampos Chelmis
 

Mais procurados (12)

Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Question answering
Question answeringQuestion answering
Question answering
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 

Semelhante a Lecture semantic augmentation

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysisxulioc
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysisPeter Bouda
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communitySelena Deckelmann
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...Radhika Puthiyetath
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Jazkarta, Inc.
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 

Semelhante a Lecture semantic augmentation (20)

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysis
 
Case study
Case studyCase study
Case study
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 
SEppt
SEpptSEppt
SEppt
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres community
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 

Mais de Dhavalkumar Thakker

UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...Dhavalkumar Thakker
 
How to instantiate pinta in a domain
How to instantiate pinta in a domainHow to instantiate pinta in a domain
How to instantiate pinta in a domainDhavalkumar Thakker
 
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Dhavalkumar Thakker
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 
Introducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserIntroducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserDhavalkumar Thakker
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhavalDhavalkumar Thakker
 

Mais de Dhavalkumar Thakker (6)

UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
 
How to instantiate pinta in a domain
How to instantiate pinta in a domainHow to instantiate pinta in a domain
How to instantiate pinta in a domain
 
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Introducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserIntroducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browser
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhaval
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Lecture semantic augmentation

  • 1. COMP3725 Knowledge Enriched Information Systems Lecture 13: Semantic Augmentation Dhavalkumar Thakker (Dhaval) School of Computing, University of Leeds 1
  • 2. Outline • Semantic Augmentation – What – Why – How • Existing systems & services for Semantic Augmentation • Challenges 2
  • 3. Semantic Augmentation • From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. • To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 3
  • 4. Semantic Augmentation • Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text. • Also called semantic annotation, semantic tagging 4
  • 5. It provides additional information about an existing piece of data. 5
  • 6. Why Semantic Augmentation? • Links to complementary information – “More about this” • Show related or similar informatiom • Reasoning and inferencing offered by semantics • Semantic annotation is the glue that ties ontologies into document spaces – remember existing web is document web • Manual metadata production cost is too high 6
  • 7. GATE for Semantic Augmentation • GATE (General Architecture for Text Engineering) – see gate.ac.uk • GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language. • See: http://gate.ac.uk/family/developer.html 7
  • 8. Overview of Gate Developer • GATE Developer • Resources Pane – applications: groups of processes to run on a document or corpus – language resources: corpus, ontologies, schemas – processing resources: tools that operate on unstructured text – datastores: saved documents and resources • Display Pane: whatever you’re currently working with. • See next slide
  • 10. Processing Resources: ANNIE • A family of Processing Resources for language analysis included with GATE • Stands for A Nearly-New Information Extraction system. • Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.
  • 11. ANNIE IE Modules http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
  • 12. Some ANNIE Components • Tokenizer – word, number, symbol, punctuation, and spaceToken. • Sentence Splitter – Segments text into sentences • Part of Speech Tagger – produces a part-of-speech tag as an annotation on each word or symbol – Nouns, verbs etc. • Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car, caring) • OntoGazetteer – Semantic Tagging component – uses ontology
  • 13. Demo: • From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. • To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 13 13
  • 14. Step : Download & Start the GATE application • Download GATE from: http://gate.ac.uk/download/ • Note: the demonstration is using GATE 6.0 14
  • 15. Step: From Language Resources Select • GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test” 15
  • 16. Paste following text…in the file • Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. 16
  • 17. Step: From Processing resources select following resources • ANNIE English Tokeniser • ANNIE Sentence Splitter • ANNIE POS Tagger • GATE Morphological Analyser • Note: For all the above, leave the “Name” field Empty 17
  • 18. Step: From Processing resources select following resources 18
  • 19. Step: From Language Resources Select • OWLIM Ontology – Specify the location of the ontology you would like to use for semantic augmentation – For example, we are using dbpedia ontology 19
  • 21. From Processing Resources Select • Select Onto Root Gazetteer • & specify parameters as follows: 21
  • 22. Final steps: Create Corpus • Go to Language resources and click on GATE Corpus, and add “Test” document created earlier 22
  • 23. Final steps: Create Corpus Pipeline • From application • And add processing resources in order shown below and press “run this application” 23
  • 24. Results: Go to file, Click on Annotation Set, Annotation List, Lookup Semantic Augmentation 24
  • 25. Other features • JAPE – a Java Annotation Patterns Engine, provides regular-expression based pattern/action rules over annotations. – Grammar to detect entities, validate detected entities, pre & post processing – Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena” – See Tutorial: http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
  • 26. Some Links • Home page is http://gate.ac.uk/ • Some good short tutorial videos for getting started: http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast • User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine. • Lots of documentation : http://gate.ac.uk/documentation.html • The wiki: http://gate.ac.uk/wiki/ • JAPE grammar by Dhaval Thakker et al http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
  • 27. Challenge: Term Ambiguity • ...this apple on the palm of my hand... • ...Apple tried to acquire Palm Inc.... • ...eating an apple sitted by a palm tree... • What do “apple” and “palm” mean in each case? • Objective is to recognize entities and disambiguate their meaning. DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) . 27
  • 28. Challenges • Disambiguation • Unknown entities • Ontology learning • Scale and speed • Co-referencing
  • 29. Existing Services for Semantic Augmentation
  • 30. Existing Services for Semantic Augmentation
  • 31. DBpedia Spotlight • DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data • DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages • Learns how to recognize that a DBpedia resource was mentioned • Given plain text as input, generates annotated text http://dbpedia-spotlight.github.com/demo/ 31
  • 34. References • DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) . • Introduction to GATE, Dr. Paula Matuszek • Various resources from gate.ac.uk 34

Notas do Editor

  1. It is just not tagging
  2. Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
  3. Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.