SlideShare uma empresa Scribd logo
1 de 34
COMP3725
Knowledge Enriched Information
           Systems


 Lecture 13: Semantic Augmentation

       Dhavalkumar Thakker (Dhaval)
   School of Computing, University of Leeds

                                              1
Outline

• Semantic Augmentation
  – What
  – Why
  – How
• Existing systems & services for Semantic
  Augmentation
• Challenges



                                             2
Semantic Augmentation

• From:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.


• To:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.
                         http://dbpedia.org/Ontology/New_York_City
  http://dbpedia.org/Ontology/Apple_Corps
                                                                     3
Semantic Augmentation

• Semantic augmentation is a process
  of attaching semantics to a selected
  part of a text to assist automatic
  interpretation of the meaning
  conveyed by the text.

• Also called semantic annotation,
  semantic tagging
                                    4
It provides additional information
   about an existing piece of data.




                                  5
Why Semantic Augmentation?

• Links to complementary information
  – “More about this”
• Show related or similar informatiom
• Reasoning and inferencing offered by
  semantics
• Semantic annotation is the glue that ties
  ontologies into document spaces –
  remember existing web is document web
• Manual metadata production cost is too
  high                                        6
GATE for Semantic
             Augmentation
• GATE (General Architecture for Text
  Engineering) – see gate.ac.uk
• GATE Developer is a development
  environment that provides a rich set of
  graphical interactive tools for the creation,
  measurement and maintenance of software
  components       for    processing    human
  language.
• See: http://gate.ac.uk/family/developer.html
                                              7
Overview of Gate Developer

• GATE Developer
• Resources Pane
   – applications: groups of processes to run on a
     document or corpus
   – language resources: corpus, ontologies, schemas
   – processing resources: tools that operate on
     unstructured text
   – datastores: saved documents and resources
• Display Pane: whatever you’re currently working
  with.
• See next slide
GATE : Interface




Resources
Pane                  Display
                      Pane




                                9
Processing Resources: ANNIE

• A family of Processing Resources for
  language analysis included with GATE
• Stands for A Nearly-New Information
  Extraction system.
• Using finite state techniques to implement
  various tasks: tokenization, semantic
  tagging, verb phrase chunking, and so on.
ANNIE IE Modules




     http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
Some ANNIE Components

• Tokenizer
   – word, number, symbol, punctuation, and spaceToken.
• Sentence Splitter
   – Segments text into sentences
• Part of Speech Tagger
    – produces a part-of-speech tag as an annotation on each word or
      symbol – Nouns, verbs etc.
• Gate Morphological Analyser
   – detecting morphemes in a piece of text (e.g. car,
     caring)
• OntoGazetteer
   – Semantic Tagging component – uses ontology
Demo:

• From:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.


• To:
 (…) Upon their return, Lennon and McCartney
 went to New York to announce the formation of
 Apple Corps.
                         http://dbpedia.org/Ontology/New_York_City
  http://dbpedia.org/Ontology/Apple_Corps
                                                                     13
                                                                       13
Step : Download & Start the
        GATE application
• Download GATE from:
  http://gate.ac.uk/download/
• Note: the demonstration is using GATE 6.0




                                          14
Step: From Language Resources
            Select
• GATE document-> Make sure that String
  content is selected in the last field, see
  screenshot below. Name the file “Test”




                                               15
Paste following text…in the file

• Upon their return, Lennon and McCartney
  went to New York to announce the
  formation of Apple Corps.




                                            16
Step: From Processing resources
   select following resources
•   ANNIE English Tokeniser
•   ANNIE Sentence Splitter
•   ANNIE POS Tagger
•   GATE Morphological Analyser
•   Note: For all the above, leave the “Name”
    field Empty


                                                17
Step: From Processing resources
   select following resources




                              18
Step: From Language Resources
            Select
• OWLIM Ontology
  – Specify the location of the ontology you would
    like to use for semantic augmentation
  – For example, we are using dbpedia ontology




                                                 19
OWLIM Ontology window




                        20
From Processing Resources
             Select
• Select Onto Root Gazetteer
• & specify parameters as follows:




                                     21
Final steps: Create Corpus

• Go to Language resources and click on GATE Corpus, and
  add “Test” document created earlier




                                                       22
Final steps: Create Corpus
               Pipeline
• From application




• And add processing resources in order shown below and
  press “run this application”




                                                          23
Results: Go to file, Click on Annotation
       Set, Annotation List, Lookup




Semantic Augmentation



                                          24
Other features

• JAPE
  – a Java Annotation Patterns Engine, provides
    regular-expression based pattern/action rules over
    annotations.
  – Grammar to detect entities, validate detected
    entities, pre & post processing
  – Example: “at the Carnegie Stadium”, “at the
    Emirates Stadium”, “at the O2 Arena”
  – See Tutorial: http://gate.ac.uk/sale/thakker-jape-
    tutorial/index.html
Some Links
• Home page is http://gate.ac.uk/
• Some good short tutorial videos for getting started:
  http://gate.ac.uk/demos/developer-videos/ . These are only
  a few minutes each, so they’re fast
• User Guide: http://gate.ac.uk/sale/tao/index.html . This is
  apparently for version 7.1, which is a development build,
  but again it seems to be fine.
• Lots of documentation :
  http://gate.ac.uk/documentation.html
• The wiki: http://gate.ac.uk/wiki/
• JAPE grammar by Dhaval Thakker et al
  http://gate.ac.uk/sale/thakker-jape-
  tutorial/index.html
Challenge: Term Ambiguity

• ...this apple on the palm of my hand...
• ...Apple tried to acquire Palm Inc....
• ...eating an apple sitted by a palm tree...

• What do “apple” and “palm” mean in each case?

• Objective is to recognize entities and disambiguate
  their meaning.
  DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,
  and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
                                                                                                                       27
Challenges

•   Disambiguation
•   Unknown entities
•   Ontology learning
•   Scale and speed
•   Co-referencing
Existing Services for Semantic
Augmentation
Existing Services for Semantic
Augmentation
DBpedia Spotlight
• DBpedia is a collection of entity descriptions
  extracted from Wikipedia & shared as linked data

• DBpedia Spotlight uses data from DBpedia and text
  from associated Wikipedia pages

• Learns how to recognize that a DBpedia resource
  was mentioned

• Given plain text as input, generates annotated text
   http://dbpedia-spotlight.github.com/demo/
                                                        31
DBpedia Spotlight




                    32
DBpedia Spotlight




                    33
References

• DBpedia Spotlight: Shedding Light on the Web of
  Documents. Pablo Mendes, Max Jakob, Andrés
  García-Silva, and Christian Bizer. In: In the
  Proceedings of the 7th International Conference on
  Semantic Systems I-Semantics (2011) .
• Introduction to GATE, Dr. Paula Matuszek
• Various resources from gate.ac.uk



                                                  34

Mais conteúdo relacionado

Mais procurados

Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...Julie Allinson
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Mathieu d'Aquin
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaCharalampos Chelmis
 

Mais procurados (12)

Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Question answering
Question answeringQuestion answering
Question answering
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 

Semelhante a Lecture semantic augmentation

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysisxulioc
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysisPeter Bouda
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communitySelena Deckelmann
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...Radhika Puthiyetath
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Jazkarta, Inc.
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 

Semelhante a Lecture semantic augmentation (20)

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysis
 
Case study
Case studyCase study
Case study
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 
SEppt
SEpptSEppt
SEppt
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres community
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...Open Writing ! -	Collaborative Authoring on Apache’s First Open-Source Cloud ...
Open Writing ! - Collaborative Authoring on Apache’s First Open-Source Cloud ...
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
Scalable Plone hosting with Amazon EC2 for Rice University's Rhaptos open lea...
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 

Mais de Dhavalkumar Thakker

UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...Dhavalkumar Thakker
 
How to instantiate pinta in a domain
How to instantiate pinta in a domainHow to instantiate pinta in a domain
How to instantiate pinta in a domainDhavalkumar Thakker
 
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Dhavalkumar Thakker
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 
Introducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserIntroducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserDhavalkumar Thakker
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhavalDhavalkumar Thakker
 

Mais de Dhavalkumar Thakker (6)

UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
UMAP 2014 - Using DBpedia as a Knowledge Source for Culture-related User Mode...
 
How to instantiate pinta in a domain
How to instantiate pinta in a domainHow to instantiate pinta in a domain
How to instantiate pinta in a domain
 
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
Assisting User Browsing over Linked Data: Requirements Elicitation with a Use...
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Introducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browserIntroducing cultural prompts in a semantic data browser
Introducing cultural prompts in a semantic data browser
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhaval
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Lecture semantic augmentation

  • 1. COMP3725 Knowledge Enriched Information Systems Lecture 13: Semantic Augmentation Dhavalkumar Thakker (Dhaval) School of Computing, University of Leeds 1
  • 2. Outline • Semantic Augmentation – What – Why – How • Existing systems & services for Semantic Augmentation • Challenges 2
  • 3. Semantic Augmentation • From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. • To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 3
  • 4. Semantic Augmentation • Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text. • Also called semantic annotation, semantic tagging 4
  • 5. It provides additional information about an existing piece of data. 5
  • 6. Why Semantic Augmentation? • Links to complementary information – “More about this” • Show related or similar informatiom • Reasoning and inferencing offered by semantics • Semantic annotation is the glue that ties ontologies into document spaces – remember existing web is document web • Manual metadata production cost is too high 6
  • 7. GATE for Semantic Augmentation • GATE (General Architecture for Text Engineering) – see gate.ac.uk • GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language. • See: http://gate.ac.uk/family/developer.html 7
  • 8. Overview of Gate Developer • GATE Developer • Resources Pane – applications: groups of processes to run on a document or corpus – language resources: corpus, ontologies, schemas – processing resources: tools that operate on unstructured text – datastores: saved documents and resources • Display Pane: whatever you’re currently working with. • See next slide
  • 10. Processing Resources: ANNIE • A family of Processing Resources for language analysis included with GATE • Stands for A Nearly-New Information Extraction system. • Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.
  • 11. ANNIE IE Modules http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
  • 12. Some ANNIE Components • Tokenizer – word, number, symbol, punctuation, and spaceToken. • Sentence Splitter – Segments text into sentences • Part of Speech Tagger – produces a part-of-speech tag as an annotation on each word or symbol – Nouns, verbs etc. • Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car, caring) • OntoGazetteer – Semantic Tagging component – uses ontology
  • 13. Demo: • From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. • To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 13 13
  • 14. Step : Download & Start the GATE application • Download GATE from: http://gate.ac.uk/download/ • Note: the demonstration is using GATE 6.0 14
  • 15. Step: From Language Resources Select • GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test” 15
  • 16. Paste following text…in the file • Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. 16
  • 17. Step: From Processing resources select following resources • ANNIE English Tokeniser • ANNIE Sentence Splitter • ANNIE POS Tagger • GATE Morphological Analyser • Note: For all the above, leave the “Name” field Empty 17
  • 18. Step: From Processing resources select following resources 18
  • 19. Step: From Language Resources Select • OWLIM Ontology – Specify the location of the ontology you would like to use for semantic augmentation – For example, we are using dbpedia ontology 19
  • 21. From Processing Resources Select • Select Onto Root Gazetteer • & specify parameters as follows: 21
  • 22. Final steps: Create Corpus • Go to Language resources and click on GATE Corpus, and add “Test” document created earlier 22
  • 23. Final steps: Create Corpus Pipeline • From application • And add processing resources in order shown below and press “run this application” 23
  • 24. Results: Go to file, Click on Annotation Set, Annotation List, Lookup Semantic Augmentation 24
  • 25. Other features • JAPE – a Java Annotation Patterns Engine, provides regular-expression based pattern/action rules over annotations. – Grammar to detect entities, validate detected entities, pre & post processing – Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena” – See Tutorial: http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
  • 26. Some Links • Home page is http://gate.ac.uk/ • Some good short tutorial videos for getting started: http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast • User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine. • Lots of documentation : http://gate.ac.uk/documentation.html • The wiki: http://gate.ac.uk/wiki/ • JAPE grammar by Dhaval Thakker et al http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
  • 27. Challenge: Term Ambiguity • ...this apple on the palm of my hand... • ...Apple tried to acquire Palm Inc.... • ...eating an apple sitted by a palm tree... • What do “apple” and “palm” mean in each case? • Objective is to recognize entities and disambiguate their meaning. DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) . 27
  • 28. Challenges • Disambiguation • Unknown entities • Ontology learning • Scale and speed • Co-referencing
  • 29. Existing Services for Semantic Augmentation
  • 30. Existing Services for Semantic Augmentation
  • 31. DBpedia Spotlight • DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data • DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages • Learns how to recognize that a DBpedia resource was mentioned • Given plain text as input, generates annotated text http://dbpedia-spotlight.github.com/demo/ 31
  • 34. References • DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) . • Introduction to GATE, Dr. Paula Matuszek • Various resources from gate.ac.uk 34

Notas do Editor

  1. It is just not tagging
  2. Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
  3. Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.