SlideShare uma empresa Scribd logo
1 de 51
Using Knowledge Graph for
Promoting Cognitive
Computing
Presenter: Dr. Saeedeh Shekarpour
2/10/2017
1
About me
Education
• 2010-2013: PhD student, AKSW Research
Group, Leipzig University, Germany
• 2014-2015: PhD/Postdocs, EIS Research
Group, Bonn University, Germany
• 2016-present: Postdocs, Knoesis Center, USA
2/10/2017
2
About me
Research Interest
6+ years experience in research in the following directions:
• Previously:
• Question Answering Systems, Semantic Search.
• Linked Data and Semantic Web Technologies.
• Statistical classifier models (e.g. HMM).
• Ontology Development.
• Natural Language Processing.
• Currently:
• Information Extraction and Knowledge graph Creation.
• Mining Social Network.
• Experiencing Deep Learning.
2/10/2017
3
About me
Selected Publications
• Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth:
RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to
Alleviate the Vocabulary Mismatch Problem. AAAI 2017
• Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer:
Question answering on interlinked data. WWW 2013: 1145-1156
• Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour,
Didier Cherix, Christoph Lange: Qanary - A Methodology for Vocabulary-
Driven Open Question Answering Systems. ESWC2016: 625-641
• Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel
Gerber, Sebastian Hellmann, Claus Stadler: Keyword-Driven SPARQL Query
Generation Leveraging Background Knowledge. Web Intelligence 2011:
203-210
2/10/2017
4
About me
Selected Publications
• Saeedeh Shekarpour, Konrad Höffner, Jens Lehmann, Sören Auer: Keyword
Query Expansion on Linked Data Using Linguistic and Semantic Features.
ICSC 2013: 191-197
• Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer:
SINA: Semantic interpretation of user queries for question answering on
interlinked data. J. Web Sem. 2015
2/10/2017
5
Outline
 Introduction
 Part 1: Vision
Advantages of using Knowledge Graph in
 Question Answering
 Machine Learning
 NLP
 Information Retrieval
 Part 2: Research in depth
 RQUERY: Rewriting Natural Language Queries on Knowledge Graphs
to Alleviate the Vocabulary Mismatch Problem
 HeadEX: Triple Extraction from Stream of News Headlines on Twitter
using n-ary Relations
2/10/2017
6
Prevalence of using KG
• Google knowledge graph
• IBM Watson
• Using knowledge graph in smart phone
 Google Now
2/10/2017
7
2/10/2017
8
The growth of Linked Open Data
EIS research group - Bonn University
8
January 2017
2973 Datasets
More than 140 billion triples
May 2007
12 Datasets
7 January 2015
Outline
 Introduction
 Part 1: Vision
Advantages of using Knowledge Graph in
 Question Answering
 Machine Learning
 NLP
 Information Retrieval
 Part 2: Research in depth
 RQUERY: Rewriting Natural Language Queries on Knowledge Graphs
to Alleviate the Vocabulary Mismatch Problem
 HeadEX: Triple Extraction from Stream of News Headlines on Twitter
using n-ary Relations
2/10/2017
9
SINA Architecture
2/10/2017
10
Client
QueryPreprocessing
QueryExpansion
ResourceRetrieval
Disambiguation
QueryConstruction
Representation
Server
UnderlyingInterlinked
KnowledgeBases
query result
keywords
valid segments
mapped resources
tuple of
resources
SPARQL
queries
OWL API
http client
Stanford
CoreNLP
SegmentValidation
Reformulated query
Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer: SINA: Semantic interpretation of user queries
for question answering on interlinked data. J. Web Sem. 30: 39-51 (2015)
2/10/2017
11
Objective:Transformationfrom Textual
Query to formal Query
Which televisions shows were created by Walt Disney?
7 January 2015EIS research group - Bonn University
11
SELECT * WHERE
{ ?v0 a dbo:TelevisionShow.
?v0 dbo:creator dbr:Walt_Disney. }
1
2
3
HowcanKGfacilitateexploitinganswerfromseveralsources?
2/10/2017
12
• Traditional QA systems window a portion of text and try to
exploit answer from there.
• Exploiting answers from different sources requires
decomposing question.
Query: What are the side effects of drugs used for Tuberculosis?
HowcanKGfacilitateexploitinganswerfromseveralsources?
• Using interlinked datasets enables exploiting information
which are spread across diverse datasets.
• Horizontal search is applicable, decomposing question is not
necessary.
2/10/2017
13
ntaining information
information, interac-
n Figure 1 the classes
der are linked using
me are linked to drugs
d possible Disease
een Sider and Disea-
property. Note that
nt the properties be-
h the following three
m at ion: An example
gs used for Tubercu-
Diseasome, drugs for
in Drugbank, while
nfor m at ion: An ex-
e query: “ side e↵ect
ASTHMA”. Here the
e obtained by joining
Drugbank (enzymes,
pansion: An exam-
aldecoxib”. Here the
d in Sider, however,
ia Sider.
roach is the first ap-
erlinked datasets by
Diseasome
Sider
Drugs
sameAs
Disease
Drug Side Effect
Genes
enzymes
Drug interactions
references targets
DrugBank
Figur e 1: Schem a int er linking for t hr ee dat aset s i.e.
D r ugB ank, Sider , D iseasome.
Diseasome
Drug
Asthma
?v0
side effectsameAs
a
?v2 ?v3
Disease
Drug Side Effect
a a
a
?v1
enzyme
Enzymes
a
SiderDrugBank
Figur e 2: R esour ces fr om t hr ee di↵er ent dat aset s
Query: What are the side effects of drugs used for Tuberculosis?
Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer: Question answering on
interlinked data. WWW 2013: 1145-1156
HowcanKGbenefitmachinelearning approaches?
Structure and semantics of Data can be employed as the
emerging features in the machine learning approaches.
• Structural features are mainly graph-based parameters such
as
Paths between entities.
Popularity degree
 Frequency
 in-degree
 out-degree
Cliques on graph
2/10/2017
14
HowcanKGbenefitmachinelearning approaches?
• Semantics features are such as
Schema-aware features:
 Hierarchy of concepts
 Label of properties
 Direction of properties
 Domain and range of properties
 Aligning ontologies and vocabularies across various domain
Data-driven features:
 Type of entities.
 Traversing owl:sameAs links
2/10/2017
15
QueryExpansionTask
Linguisticvs.SemanticFeaturesforQueryExpansionTask
• Linguistic features from WordNet:
 Synonyms: words having a similar meanings.
 Hyponyms: words representing a specialization of the input.
 Hypernyms: words representing a generalization of the input.
• Semantic Features from Linked Data:
 Using owl:sameAs. And rdfs:seeAlso: using rdfs:seeAlso.
 Using owl:equivalentClass and owl:equivalentProperty.
 Following the rdfs:subClassOf or rdfs:subPropertyOf property.
 Following the rdfs:subClassOf or rdfs:subPropertyOf.
 Using skos:broader and skos:broadMatch.
 Using skos:narrower and skos:narrowMatch.
 Using skos:closeMatch, skos:mappingRelation and skos:exactMatch.
2/10/2017
16
Exemplaryexpansiongraphof the word
movie
2/10/2017
17
movie
home movieproduction
film
motion
picture show
video
telefilm
Saeedeh Shekarpour, Konrad Höffner, Jens Lehmann, Sören Auer: Keyword Query Expansion on Linked
Data Using Linguistic and Semantic Features. ICSC 2013: 191-197
HowcanKGpromoteNLPapproaches?
• Still the type of recognized entities by NER are limited to types
such as Person, Organization, Place, Date, Time.
• With the support of KG, NER tools can be schema-aware and
extended in order to
Find new entities e.g. name of drugs, animals
Remove case sensitivity from NER
Have schema-aware annotations, e.g.
President Barack Obama tweeted the American people in his final hours as head of state promising to continue his
work with them, and unveiling a new website.
2/10/2017
18
Person President
Father
Spous
e
HowcanKGpromotedisambiguationapproaches?
• Using KG as the background knowledge enriches context
• Having richer context, having well-performed disambiguation
approaches
2/10/2017
19
2/10/2017
20
1
2
3
Unknown
Entity
4
5
6
7
8
9
Start
Keyword 1 Keyword 3Keyword 2 Keyword 4
QueryDisambiguation
Concurrentsegmentation&disambiguationusing
hiddenMarkovmodel
HowcanKGbenefitIR approaches?
• Our search engines are not limited to keyword-based retrieval
• Search engines are moving towards to semantic retrieval & QA
• KG enables us to template-based approaches.
2/10/2017
21
Template-basedapproachforsemanticsearch
22Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus
Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web
Intelligence 2011: 203-210
Categorization
basedonthematterofinformation
 Finding special characteristics of an instance
 Finding similar instances
Finding associations between instances
23Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus
Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web
Intelligence 2011: 203-210
Samples of keywords and results
2/10/2017
24Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus
Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web
Intelligence 2011: 203-210
Outline
 Introduction
 Part 1: Vision
Advantages of using Knowledge Graph in
 Question Answering
 Machine Learning
 NLP
 Information Retrieval
 Part 2: Research in depth
 RQUERY: Rewriting Natural Language Queries on Knowledge Graphs
to Alleviate the Vocabulary Mismatch Problem
 HeadEX: Triple Extraction from Stream of News Headlines on Twitter
using n-ary Relations
2/10/2017
25
InputQuery & VocabularyMismatchProblem
• It is likely that the input queries do not match with the background
knowledge.
• Query expansion and query rewriting are solutions for this problem.
• But they are in danger of potentially yielding a large number of
irrelevant words, which in turn negatively influences runtime as well
as accuracy.
Input Query
2/10/2017
26
k1k2 k3
10 ´10 ´10
Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth: RQUERY: Rewriting Natural Language
Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem. AAAI 2017
RQUERY Overview
I. Segment Generation: (1) Tokenization and stop word removal. (2) We generate all possible
segments which can be derived from q.
II. Segment Expansion: This module expands segments derived from the previous module using a
linguistic the thesaurus using linguistic features of WordNet as (1) synonyms (2) hypernyms.
III. Derived Word Validation: Each derived word is validated against the background knowledge
base.
IV. Detecting and ranking possible query rewrites: We aim at distinguishing and ranking possible
query rewrites. We address the problem of finding the appropriate query rewrite by employing
a Hidden Markov Model (HMM) in three steps:
i. The state space is populated.
ii. Transitions between states are established.
iii. Parameters are bootstrapped.
2/10/2017
27
RDF Knowledge Base
External Resources
RQUERY
WordNet
Segment
generation
Segment
expansion
Derived
word
validation
Detecting and
ranking query
rewrites model
construct
Input textual
query
Ranked list of
rewritten queries
Example – Part 1
2/10/2017
28
• Input Query: ‘What is the profession of bandleader?’
• Steps:
1) RQUERY derives and validates 10 words for the two given input keywords.
2) The state space is populated with all of these 10 validated words.
3) Then, all the transitions between states are recognized and established.
band
leader
director
music
director
conductor
occupation
profession
line
business
vacation
job
Start
profession bandleader
Observation 1 Observation 2
Example – Part 2
4) Finally, we run the Viterbi algorithm, which is a dynamic programming approach for
finding the optimal path through a HMM. This algorithm discovers the most likely states
that the sequence of input keywords is observable through.
5) Thus, after running the Viterbi algorithm for the running query “profession of
bandleader”, the generated top-6 outputs are as follows:
2/10/2017
29
Methodology: Modeling by HMM
2/10/2017
30
• B : X ⇥Y ! [0, 1] represents theemission matrix. Each
entry bi − seg = P(seg|Si ) is the probability of emitting
thesegment seg from thestateSi .
• ⇡ : X ! [0, 1] denotes theinitial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model λ are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm ) which explain thegiven observation, i.e. in-
put query q(k1, ..., kn ). Please note that there are possibly
multiple distinct sequences of states which the given input
query q is observable through, thus the aim is obtaining the
optimal one; formally as: γ = argmaxqr
{ P(qr | q, λ)} .
P(qr | q, λ)} istheprobability of observing thegiven query
qthrough thesequenceof statesqr . For computing theprob-
ability of any query rewriteqr , themodel λ playsaroleasa
constant parameter, thusweassume
P(qr | q, λ)} ⇡ P(qr | q) =) γ = argmax
qr
{ P(qr | q)}
Assuming that qr is a sequence of states (S1...Sm ) (please
(a
pr
(d
ob
parameters of our HMM. Formally, a HMM is a quintuple
λ = (X , Y, A, B , ⇡ ) where:
• X is a finite set of states. In our case, X equals the set of
thevalidated derived wordsW . In other words, each word
w 2 W forms a state.
• Y denotes the set of observations. Here, Y equals the set
of all segments 8seg 2 S derived from the input n-tuple
of keywords q.
• A : X ⇥ X ! [0, 1] is the transition matrix. Each entry
ai j is the transition probability P(Sj |Si ) from state Si to
state Sj .
• B : X ⇥Y ! [0, 1] represents the emission matrix. Each
entry bi − seg = P(seg|Si ) is the probability of emitting
the segment seg from the state Si .
• ⇡ : X ! [0, 1] denotes the initial probability of states.
We define the basic problem as follows: the sequence
of input keywords q and the model λ are given, and the
problem is to find the optimal sequence of states qr =
(S1, S2, ..., Sm ) which explain thegiven observation, i.e. in-
For insta
pr of ess
from the
Transitio
tween sta
We adop
traditiona
RDF kno
co-occurr
scriptions
s
l
w1
(a)
predicat
Triples
• A triple has subject–predicate–object structure
• Jack knows Ann
2/10/2017
31
Subject Object
Predicate
Jack Ann
knows
Triple-based Co-occurence
where:
states. In our case, X equals the set of
d wordsW. In other words, each word
e.
f observations. Here, Y equals the set
eg 2 S derived from the input n-tuple
1] is the transition matrix. Each entry
probability P(Sj |Si ) from state Si to
] represents the emission matrix. Each
seg|Si ) is the probability of emitting
m the state Si .
otes the initial probability of states.
c problem as follows: the sequence
and the model λ are given, and the
he optimal sequence of states qr =
h explain thegiven observation, i.e. in-
). Please note that there are possibly
ences of states which the given input
through, thus the aim is obtaining the
as: γ = argmaxqr
{ P(qr | q, λ)} .
obability of observing thegiven query
e of states qr . For computing theprob-
write qr , the model λ plays arole asa
us weassume
qr | q) =) γ = argmax
qr
{ P(qr | q)}
sequence of states (S1...Sm ) (please
corresponds to the word wi ). We ex-
qr | q) = P(S1...Sm | k1...kn ). The
ng the keyword ki from the state Sj is
. Asfrom astate Si either oneor mul-
be observable, the number of states
o the number of keywords m < = n.
v property, the probability of reach-
observing the keyword kn is equal to
n | Sm ). Thus, theequation (2) can be
Sm − 1)⇤P(kn | Sm ))⇤P(S1...Sm − 1 |
pr of essi on, so the keyword pr of essi on is emitted
from the state associated with the word j ob.
Transitions between States. We define transitions be-
tween statesbased ontheconcept of co-occurrenceof words.
We adopt the concept of co-occurrence of words from the
traditional information retrieval context and move it to the
RDF knowledge bases. Triple-based co-occurrence means
co-occurrence of words in literals found in the resource de-
scriptions of thetwo resources of agiven triple:
s p o
l
w1
l
w2
(a) subject-
predicate.
s p o
l
w1
l
w2
(b) subject-object.
s p w2
l
w1
(c) subject-literal.
s p o
l
w2
l
w1
(d) predicate-
object.
s p w2
l
w1
(e) predicate-
literal.
s" p" o"
a"
c"
l"
‘w2’"
‘w1’"l"
(f) predicate-Type
of subject.
s" p" o"
l"‘w2’"
l"
‘w1’"
a"
c"
(g) predicate-Type of ob-
ject.
Figure 3: The graph patterns employed for recognising co-
occurrence of the two given words w1 and w2. Please note
that theletterss, p, o, c, l and arespectively stand for subject,
predicate, object, class, rdfs:label and rdf:class.
2/10/2017
32
Evaluation
 Evaluation Criteria: The goal of our evaluation is investigating positive as well as
negative impacts of the proposed approach by raising the following two
questions:
① How effective is the approach for addressing the vocabulary mismatch problem when
employing queries having a vocabulary mismatch problem?
② How effective is the approach for avoiding noise when employing queries
which do not have a vocabulary mismatch problem?
 We employ Mean Reciprocal Rank (MRR)?
 Benchmark: we use an evaluation test collection for schema-agnostic query
mechanisms on RDF datasets (i.e. DBpedia) presented in ESWC 2015.
 https://sites.google.com/site/eswcsaq2015/documents
2/10/2017
33
Evaluation
• Bootstrapping:
• Issue: Since we encounter a dynamic modeling meaning state space as well as issued
observation (i.e., sequence of input keywords) vary query by query. Thus, learning probability
values should be generic and not query-dependent because learning model probabilities for
each individual query is not feasible.
• Solution: Thus, we rely on bootstrapping, a technique used to estimate an unknown
probability distribution function. We apply three distributions (i.e., normal, uniform and
zipfian) to find out the most appropriate distribution.
2/10/2017
34
0.76
0.51
0.69
0.85
0.44
0.82
0.68
0.58
0.63
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Uniform Distribu on Normal Distribu on Zipfian Distribu on
MeanReciprocalRank
All Queries Q1-10 Q11-20
Evaluation Results
0.00
0.20
0.40
0.60
0.80
1.00
Q12 Q15 Q18 Q20 Q21 Q24 Q29 Q31 Q40 Q51 Q54 Q65 Q70 Q76 Q78 Q84
ReciprocalRank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram Language Model
2/10/2017
35
0.00
0.20
0.40
0.60
0.80
1.00
Q2 Q3 Q5 Q8 Q10 Q16 Q22 Q34 Q37 Q46 Q48 Q49 Q50 Q58 Q59 Q63 Q64 Q69 Q85 Q91 Q93
ReciprocalRank
HMM with Implicit Frequency HMM with Explicit Frequency n-gram model
Queries which do not have a mismatch problem
Queries which have a mismatch problem
Outline
 Introduction
 Part 1: Vision
Advantages of using Knowledge Graph in
 Question Answering
 Machine Learning
 NLP
 Information Retrieval
 Part 2: Research in depth
 RQUERY: Rewriting Natural Language Queries on Knowledge Graphs
to Alleviate the Vocabulary Mismatch Problem
 HeadEX: Triple Extraction from Stream of News Headlines on Twitter
using n-ary Relations
2/10/2017
36
Knowledge Graph Creation
HeadEx: Triple Extraction from Stream of News
Headlines on Twitter using n-ary Relations
2/10/2017
37
Stream of News Headlines
2/10/2017
38
CEVO: Cognitive annotation on relations
• Problem:
 Relation Extraction
 Contextual equivalence of relations
 Diversity in Conceptualization
Requirements:
 Relation tagging on textual data
 Relation linking
 Integration and alignment of properties
 Simplicity
 Reusability
2/10/2017
39
CEVO: Cognitive annotation on relations
• CEVO is built up on Levin ‘s categorization on
English verbs.
• CEVO has an abstract conceptualization
• You can find CEVO at http://eventontology.org
2/10/2017
40
Background Data Model
the meet event is associated with entities with type of Par t i ci pant and Topi c
(i.e., topic discussed in themeeting). Considering the sample of tweets in Table ??, the
tweets no1, no4, no7 are instances of the event Communi cat i on with the mentions
t el l , say, announce. Thetweets no2, no5, no8 areinstances of theevent Meet
with thementions meet , vi si t . Thetweets no3, no6, no9 areinstances of theevent
Mur der with the mention ki l l .
subclass
Generic Event
Communica3on Meet
Publisher
Published
By
subclass
xs:date
Murder
subclass
published
date
Loca3on
Time
occurredIn
occurredon
(a) SubClasses of Event
Meet
Par( cipant
Topic
about
A2ended
in
(b) Meet Class
Communica) on
Giver Addressee
Message
expressed
says
addressed
(c) Communication Class
Murder
Vic*m
cause
Killer
quan*ty
kills
killed
caused xs:string
xs:integer
expression
(d) Murder Class
Fig. 1: Subclasses of theGeneric Event.
2/10/2017
41
Example
Tweet #2: Instagram CEO meets with @Pontifex to discuss "the power
of images to unite people".
1. :Meet#1 a :Meet ; rdfs:label `meets' .
2. :e1 a :Participant ; rdfs:label `Instagram CEO' .
3. :e2 a :Participant ; rdfs:label `@Pontifex'
4. :t1 a :Topic ;
:body `to discuss the power of images to unite people' .
5. :e1 :attendedIn Meet#1 .
6. :e2 :attendedIn Meet#1 .
7. :Meet#1 :about :t1 .
8. :Meet#1 :publisher :CNN .
9. :Meet#1 :date `26/2/2106' .
2/10/2017
42
Overview
Crawling
News
Tweets
Disambigua on
& Valida on &
URI assignment
Filtering
Event
Recognition
Entity
Extraction
2/10/2017
43
Entity ExtractionusingLinguisticAnalysis
2/10/2017
44
withInstagram CEO @Pon4fex the powerto
compound
case mark det
of images peopleto unitediscuss
dobj
case
nmod mark
dobj
acl
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokens inside that chunk co-reference interior tokens (interior arrows). According to this
definition, the example tweet contains four individual chunks. For the chunk ‘ Instagram
CEO’ , only the token ‘ CEO’ is dependent on the root and the other token ‘ instagram’
is dependent on the interior token ‘ CEO’ .
meets
Instagram CEO With @Pon4fex
nsubj xcomp
compound
case mark det
To discuss the power of images to unite people
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Chunk 1 Chunk 2 Chunk 4
Chunk 3
Fig. 3: Chunking the running example based on the concept of Root Dependent Chunk.
meets
withInstagram CEO @Pon4fex the powerto
nsubj xcomp
compound
case mark det
of images peopleto unitediscuss
nmod
dobj
case
nmod mark
dobj
acl
ROOT
Fig. 2: Dependency tree for the running example.
Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the
longest sequence of tokens of a given tweet that satisfies the following conditions: (i)
There is one token that is (directly) dependent on the root, and (ii) any other token
included in a given chunk is dependent on a token already within the given chunk.
Moreover, ROOT is an individual chunk.
Example 2 (Chunking Tweet). We chunk the running example based on the concept
of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for
the chunk of root (because root is an individual chunk), any other chunk has only one
token that is dependent on the root (only one outgoing arrow to the root) and other
tokensinside that chunk co-reference interior tokens(interior arrows). According to this
definition, theexample tweet contains four individual chunks. For thechunk ‘Instagram
The best observed accuracy for Entity
Extraction Tasks
2/10/2017
45
Entity Extraction
Sequence Labeling Using Deep Learning
2/10/2017
46
Thank you
Any Question?
2/10/2017
47
Annotation Evolution
Metadata
Annota on
Linguis c
Annota on
Interoperability
Annota on
Cogni ve
Annota on
PROV Ontology
Dublin Core Meta
Data
OLiA Ontologies
Language Annota on
Framework (LAF)
MEX (Machine
Learning)
QANARY (Ques on
Answering)
NLP Interchange
Format (NIF)
CEVO (Comprehensive
Event Ontology)
Universal Conceptual
Cogni ve Annota on
(UCCA)
2/10/2017
48
CEVO use case 1: Annotating Text
BBC Tweet#1 on 10/3/2016:
Obama and Justin Trudeau announce efforts to fight climate change.
NYT Tweet#2 14/3/2016:
State elections were "difficult day," German Chancellor Angela Merkel says.
CEVO:Communication
CEVO:Communication
2/10/2017
49
CEVOuse case2: Annotating Ontological
Properties
We use Web Annotation Data Model (WADM) for annotating
ontological properties.
example:annotation1 a oa:Annotation
oa:hasTarget dbo:spouse
oa:hasBody cevo:Amalgamate
2/10/2017
50
CEVOuse case3: Relation Linking
• Example: Rupert Murdoch and Jerry Hall marry.
<exam:headline#char=31,35> a nif:String ;
nif:beginIndex 31 ;
nif:endIndex 35 ;
nif:anchorOf "marry" ;
nif:oliaCategory Olia:MainVerb .
a cevo:Amalgamate .
example:annotation3 a oa:Annotation ;
oa:hasTarget exam:headline#char=31,35 ;
oa:hasBody dbo:spouse .
2/10/2017
51

Mais conteúdo relacionado

Mais procurados

Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Data science as a science
Data science as a scienceData science as a science
Data science as a sciencejtleek
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPHjtleek
 
Fixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinicFixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinicjtleek
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
An Introduction and Applications of DOI
An Introduction and Applications of DOIAn Introduction and Applications of DOI
An Introduction and Applications of DOINader Ale Ebrahim
 
Share Scientific Data to Improve Research Visibility and Impact
Share Scientific Data to Improve Research Visibility and ImpactShare Scientific Data to Improve Research Visibility and Impact
Share Scientific Data to Improve Research Visibility and ImpactNader Ale Ebrahim
 
Natural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual DataNatural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual Datagpano
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 

Mais procurados (20)

Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Data science as a science
Data science as a scienceData science as a science
Data science as a science
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-TextNLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Data Science Education at JHSPH
Data Science Education at JHSPHData Science Education at JHSPH
Data Science Education at JHSPH
 
Fixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinicFixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinic
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
An Introduction and Applications of DOI
An Introduction and Applications of DOIAn Introduction and Applications of DOI
An Introduction and Applications of DOI
 
Share Scientific Data to Improve Research Visibility and Impact
Share Scientific Data to Improve Research Visibility and ImpactShare Scientific Data to Improve Research Visibility and Impact
Share Scientific Data to Improve Research Visibility and Impact
 
Natural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual DataNatural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual Data
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
Seefeld stats r_bio
Seefeld stats r_bioSeefeld stats r_bio
Seefeld stats r_bio
 

Semelhante a Using Knowledge Graph for Promoting Cognitive Computing

Construction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge GraphsConstruction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge GraphsSutanay Choudhury
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...AI Publications
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsAndre Freitas
 
Providing Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgProviding Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgJingbo Wang
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Eric Stephan
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open ScienceBeth Plale
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesFreddy Limpens
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceStefan Dietze
 

Semelhante a Using Knowledge Graph for Promoting Cognitive Computing (20)

Using Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive ComputingUsing Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive Computing
 
Construction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge GraphsConstruction and Querying of Dynamic Knowledge Graphs
Construction and Querying of Dynamic Knowledge Graphs
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
Providing Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgProviding Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.org
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open Science
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Using Knowledge Graph for Promoting Cognitive Computing

  • 1. Using Knowledge Graph for Promoting Cognitive Computing Presenter: Dr. Saeedeh Shekarpour 2/10/2017 1
  • 2. About me Education • 2010-2013: PhD student, AKSW Research Group, Leipzig University, Germany • 2014-2015: PhD/Postdocs, EIS Research Group, Bonn University, Germany • 2016-present: Postdocs, Knoesis Center, USA 2/10/2017 2
  • 3. About me Research Interest 6+ years experience in research in the following directions: • Previously: • Question Answering Systems, Semantic Search. • Linked Data and Semantic Web Technologies. • Statistical classifier models (e.g. HMM). • Ontology Development. • Natural Language Processing. • Currently: • Information Extraction and Knowledge graph Creation. • Mining Social Network. • Experiencing Deep Learning. 2/10/2017 3
  • 4. About me Selected Publications • Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth: RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem. AAAI 2017 • Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer: Question answering on interlinked data. WWW 2013: 1145-1156 • Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour, Didier Cherix, Christoph Lange: Qanary - A Methodology for Vocabulary- Driven Open Question Answering Systems. ESWC2016: 625-641 • Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web Intelligence 2011: 203-210 2/10/2017 4
  • 5. About me Selected Publications • Saeedeh Shekarpour, Konrad Höffner, Jens Lehmann, Sören Auer: Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features. ICSC 2013: 191-197 • Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer: SINA: Semantic interpretation of user queries for question answering on interlinked data. J. Web Sem. 2015 2/10/2017 5
  • 6. Outline  Introduction  Part 1: Vision Advantages of using Knowledge Graph in  Question Answering  Machine Learning  NLP  Information Retrieval  Part 2: Research in depth  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem  HeadEX: Triple Extraction from Stream of News Headlines on Twitter using n-ary Relations 2/10/2017 6
  • 7. Prevalence of using KG • Google knowledge graph • IBM Watson • Using knowledge graph in smart phone  Google Now 2/10/2017 7
  • 8. 2/10/2017 8 The growth of Linked Open Data EIS research group - Bonn University 8 January 2017 2973 Datasets More than 140 billion triples May 2007 12 Datasets 7 January 2015
  • 9. Outline  Introduction  Part 1: Vision Advantages of using Knowledge Graph in  Question Answering  Machine Learning  NLP  Information Retrieval  Part 2: Research in depth  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem  HeadEX: Triple Extraction from Stream of News Headlines on Twitter using n-ary Relations 2/10/2017 9
  • 10. SINA Architecture 2/10/2017 10 Client QueryPreprocessing QueryExpansion ResourceRetrieval Disambiguation QueryConstruction Representation Server UnderlyingInterlinked KnowledgeBases query result keywords valid segments mapped resources tuple of resources SPARQL queries OWL API http client Stanford CoreNLP SegmentValidation Reformulated query Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer: SINA: Semantic interpretation of user queries for question answering on interlinked data. J. Web Sem. 30: 39-51 (2015)
  • 11. 2/10/2017 11 Objective:Transformationfrom Textual Query to formal Query Which televisions shows were created by Walt Disney? 7 January 2015EIS research group - Bonn University 11 SELECT * WHERE { ?v0 a dbo:TelevisionShow. ?v0 dbo:creator dbr:Walt_Disney. } 1 2 3
  • 12. HowcanKGfacilitateexploitinganswerfromseveralsources? 2/10/2017 12 • Traditional QA systems window a portion of text and try to exploit answer from there. • Exploiting answers from different sources requires decomposing question. Query: What are the side effects of drugs used for Tuberculosis?
  • 13. HowcanKGfacilitateexploitinganswerfromseveralsources? • Using interlinked datasets enables exploiting information which are spread across diverse datasets. • Horizontal search is applicable, decomposing question is not necessary. 2/10/2017 13 ntaining information information, interac- n Figure 1 the classes der are linked using me are linked to drugs d possible Disease een Sider and Disea- property. Note that nt the properties be- h the following three m at ion: An example gs used for Tubercu- Diseasome, drugs for in Drugbank, while nfor m at ion: An ex- e query: “ side e↵ect ASTHMA”. Here the e obtained by joining Drugbank (enzymes, pansion: An exam- aldecoxib”. Here the d in Sider, however, ia Sider. roach is the first ap- erlinked datasets by Diseasome Sider Drugs sameAs Disease Drug Side Effect Genes enzymes Drug interactions references targets DrugBank Figur e 1: Schem a int er linking for t hr ee dat aset s i.e. D r ugB ank, Sider , D iseasome. Diseasome Drug Asthma ?v0 side effectsameAs a ?v2 ?v3 Disease Drug Side Effect a a a ?v1 enzyme Enzymes a SiderDrugBank Figur e 2: R esour ces fr om t hr ee di↵er ent dat aset s Query: What are the side effects of drugs used for Tuberculosis? Saeedeh Shekarpour, Axel-Cyrille Ngonga Ngomo, Sören Auer: Question answering on interlinked data. WWW 2013: 1145-1156
  • 14. HowcanKGbenefitmachinelearning approaches? Structure and semantics of Data can be employed as the emerging features in the machine learning approaches. • Structural features are mainly graph-based parameters such as Paths between entities. Popularity degree  Frequency  in-degree  out-degree Cliques on graph 2/10/2017 14
  • 15. HowcanKGbenefitmachinelearning approaches? • Semantics features are such as Schema-aware features:  Hierarchy of concepts  Label of properties  Direction of properties  Domain and range of properties  Aligning ontologies and vocabularies across various domain Data-driven features:  Type of entities.  Traversing owl:sameAs links 2/10/2017 15
  • 16. QueryExpansionTask Linguisticvs.SemanticFeaturesforQueryExpansionTask • Linguistic features from WordNet:  Synonyms: words having a similar meanings.  Hyponyms: words representing a specialization of the input.  Hypernyms: words representing a generalization of the input. • Semantic Features from Linked Data:  Using owl:sameAs. And rdfs:seeAlso: using rdfs:seeAlso.  Using owl:equivalentClass and owl:equivalentProperty.  Following the rdfs:subClassOf or rdfs:subPropertyOf property.  Following the rdfs:subClassOf or rdfs:subPropertyOf.  Using skos:broader and skos:broadMatch.  Using skos:narrower and skos:narrowMatch.  Using skos:closeMatch, skos:mappingRelation and skos:exactMatch. 2/10/2017 16
  • 17. Exemplaryexpansiongraphof the word movie 2/10/2017 17 movie home movieproduction film motion picture show video telefilm Saeedeh Shekarpour, Konrad Höffner, Jens Lehmann, Sören Auer: Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features. ICSC 2013: 191-197
  • 18. HowcanKGpromoteNLPapproaches? • Still the type of recognized entities by NER are limited to types such as Person, Organization, Place, Date, Time. • With the support of KG, NER tools can be schema-aware and extended in order to Find new entities e.g. name of drugs, animals Remove case sensitivity from NER Have schema-aware annotations, e.g. President Barack Obama tweeted the American people in his final hours as head of state promising to continue his work with them, and unveiling a new website. 2/10/2017 18 Person President Father Spous e
  • 19. HowcanKGpromotedisambiguationapproaches? • Using KG as the background knowledge enriches context • Having richer context, having well-performed disambiguation approaches 2/10/2017 19
  • 20. 2/10/2017 20 1 2 3 Unknown Entity 4 5 6 7 8 9 Start Keyword 1 Keyword 3Keyword 2 Keyword 4 QueryDisambiguation Concurrentsegmentation&disambiguationusing hiddenMarkovmodel
  • 21. HowcanKGbenefitIR approaches? • Our search engines are not limited to keyword-based retrieval • Search engines are moving towards to semantic retrieval & QA • KG enables us to template-based approaches. 2/10/2017 21
  • 22. Template-basedapproachforsemanticsearch 22Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web Intelligence 2011: 203-210
  • 23. Categorization basedonthematterofinformation  Finding special characteristics of an instance  Finding similar instances Finding associations between instances 23Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web Intelligence 2011: 203-210
  • 24. Samples of keywords and results 2/10/2017 24Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, Claus Stadler: Keyword-Driven SPARQL Query Generation Leveraging Background Knowledge. Web Intelligence 2011: 203-210
  • 25. Outline  Introduction  Part 1: Vision Advantages of using Knowledge Graph in  Question Answering  Machine Learning  NLP  Information Retrieval  Part 2: Research in depth  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem  HeadEX: Triple Extraction from Stream of News Headlines on Twitter using n-ary Relations 2/10/2017 25
  • 26. InputQuery & VocabularyMismatchProblem • It is likely that the input queries do not match with the background knowledge. • Query expansion and query rewriting are solutions for this problem. • But they are in danger of potentially yielding a large number of irrelevant words, which in turn negatively influences runtime as well as accuracy. Input Query 2/10/2017 26 k1k2 k3 10 ´10 ´10 Saeedeh Shekarpour, Edgard Marx, Sören Auer, Amit Sheth: RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem. AAAI 2017
  • 27. RQUERY Overview I. Segment Generation: (1) Tokenization and stop word removal. (2) We generate all possible segments which can be derived from q. II. Segment Expansion: This module expands segments derived from the previous module using a linguistic the thesaurus using linguistic features of WordNet as (1) synonyms (2) hypernyms. III. Derived Word Validation: Each derived word is validated against the background knowledge base. IV. Detecting and ranking possible query rewrites: We aim at distinguishing and ranking possible query rewrites. We address the problem of finding the appropriate query rewrite by employing a Hidden Markov Model (HMM) in three steps: i. The state space is populated. ii. Transitions between states are established. iii. Parameters are bootstrapped. 2/10/2017 27 RDF Knowledge Base External Resources RQUERY WordNet Segment generation Segment expansion Derived word validation Detecting and ranking query rewrites model construct Input textual query Ranked list of rewritten queries
  • 28. Example – Part 1 2/10/2017 28 • Input Query: ‘What is the profession of bandleader?’ • Steps: 1) RQUERY derives and validates 10 words for the two given input keywords. 2) The state space is populated with all of these 10 validated words. 3) Then, all the transitions between states are recognized and established. band leader director music director conductor occupation profession line business vacation job Start profession bandleader Observation 1 Observation 2
  • 29. Example – Part 2 4) Finally, we run the Viterbi algorithm, which is a dynamic programming approach for finding the optimal path through a HMM. This algorithm discovers the most likely states that the sequence of input keywords is observable through. 5) Thus, after running the Viterbi algorithm for the running query “profession of bandleader”, the generated top-6 outputs are as follows: 2/10/2017 29
  • 30. Methodology: Modeling by HMM 2/10/2017 30 • B : X ⇥Y ! [0, 1] represents theemission matrix. Each entry bi − seg = P(seg|Si ) is the probability of emitting thesegment seg from thestateSi . • ⇡ : X ! [0, 1] denotes theinitial probability of states. We define the basic problem as follows: the sequence of input keywords q and the model λ are given, and the problem is to find the optimal sequence of states qr = (S1, S2, ..., Sm ) which explain thegiven observation, i.e. in- put query q(k1, ..., kn ). Please note that there are possibly multiple distinct sequences of states which the given input query q is observable through, thus the aim is obtaining the optimal one; formally as: γ = argmaxqr { P(qr | q, λ)} . P(qr | q, λ)} istheprobability of observing thegiven query qthrough thesequenceof statesqr . For computing theprob- ability of any query rewriteqr , themodel λ playsaroleasa constant parameter, thusweassume P(qr | q, λ)} ⇡ P(qr | q) =) γ = argmax qr { P(qr | q)} Assuming that qr is a sequence of states (S1...Sm ) (please (a pr (d ob parameters of our HMM. Formally, a HMM is a quintuple λ = (X , Y, A, B , ⇡ ) where: • X is a finite set of states. In our case, X equals the set of thevalidated derived wordsW . In other words, each word w 2 W forms a state. • Y denotes the set of observations. Here, Y equals the set of all segments 8seg 2 S derived from the input n-tuple of keywords q. • A : X ⇥ X ! [0, 1] is the transition matrix. Each entry ai j is the transition probability P(Sj |Si ) from state Si to state Sj . • B : X ⇥Y ! [0, 1] represents the emission matrix. Each entry bi − seg = P(seg|Si ) is the probability of emitting the segment seg from the state Si . • ⇡ : X ! [0, 1] denotes the initial probability of states. We define the basic problem as follows: the sequence of input keywords q and the model λ are given, and the problem is to find the optimal sequence of states qr = (S1, S2, ..., Sm ) which explain thegiven observation, i.e. in- For insta pr of ess from the Transitio tween sta We adop traditiona RDF kno co-occurr scriptions s l w1 (a) predicat
  • 31. Triples • A triple has subject–predicate–object structure • Jack knows Ann 2/10/2017 31 Subject Object Predicate Jack Ann knows
  • 32. Triple-based Co-occurence where: states. In our case, X equals the set of d wordsW. In other words, each word e. f observations. Here, Y equals the set eg 2 S derived from the input n-tuple 1] is the transition matrix. Each entry probability P(Sj |Si ) from state Si to ] represents the emission matrix. Each seg|Si ) is the probability of emitting m the state Si . otes the initial probability of states. c problem as follows: the sequence and the model λ are given, and the he optimal sequence of states qr = h explain thegiven observation, i.e. in- ). Please note that there are possibly ences of states which the given input through, thus the aim is obtaining the as: γ = argmaxqr { P(qr | q, λ)} . obability of observing thegiven query e of states qr . For computing theprob- write qr , the model λ plays arole asa us weassume qr | q) =) γ = argmax qr { P(qr | q)} sequence of states (S1...Sm ) (please corresponds to the word wi ). We ex- qr | q) = P(S1...Sm | k1...kn ). The ng the keyword ki from the state Sj is . Asfrom astate Si either oneor mul- be observable, the number of states o the number of keywords m < = n. v property, the probability of reach- observing the keyword kn is equal to n | Sm ). Thus, theequation (2) can be Sm − 1)⇤P(kn | Sm ))⇤P(S1...Sm − 1 | pr of essi on, so the keyword pr of essi on is emitted from the state associated with the word j ob. Transitions between States. We define transitions be- tween statesbased ontheconcept of co-occurrenceof words. We adopt the concept of co-occurrence of words from the traditional information retrieval context and move it to the RDF knowledge bases. Triple-based co-occurrence means co-occurrence of words in literals found in the resource de- scriptions of thetwo resources of agiven triple: s p o l w1 l w2 (a) subject- predicate. s p o l w1 l w2 (b) subject-object. s p w2 l w1 (c) subject-literal. s p o l w2 l w1 (d) predicate- object. s p w2 l w1 (e) predicate- literal. s" p" o" a" c" l" ‘w2’" ‘w1’"l" (f) predicate-Type of subject. s" p" o" l"‘w2’" l" ‘w1’" a" c" (g) predicate-Type of ob- ject. Figure 3: The graph patterns employed for recognising co- occurrence of the two given words w1 and w2. Please note that theletterss, p, o, c, l and arespectively stand for subject, predicate, object, class, rdfs:label and rdf:class. 2/10/2017 32
  • 33. Evaluation  Evaluation Criteria: The goal of our evaluation is investigating positive as well as negative impacts of the proposed approach by raising the following two questions: ① How effective is the approach for addressing the vocabulary mismatch problem when employing queries having a vocabulary mismatch problem? ② How effective is the approach for avoiding noise when employing queries which do not have a vocabulary mismatch problem?  We employ Mean Reciprocal Rank (MRR)?  Benchmark: we use an evaluation test collection for schema-agnostic query mechanisms on RDF datasets (i.e. DBpedia) presented in ESWC 2015.  https://sites.google.com/site/eswcsaq2015/documents 2/10/2017 33
  • 34. Evaluation • Bootstrapping: • Issue: Since we encounter a dynamic modeling meaning state space as well as issued observation (i.e., sequence of input keywords) vary query by query. Thus, learning probability values should be generic and not query-dependent because learning model probabilities for each individual query is not feasible. • Solution: Thus, we rely on bootstrapping, a technique used to estimate an unknown probability distribution function. We apply three distributions (i.e., normal, uniform and zipfian) to find out the most appropriate distribution. 2/10/2017 34 0.76 0.51 0.69 0.85 0.44 0.82 0.68 0.58 0.63 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Uniform Distribu on Normal Distribu on Zipfian Distribu on MeanReciprocalRank All Queries Q1-10 Q11-20
  • 35. Evaluation Results 0.00 0.20 0.40 0.60 0.80 1.00 Q12 Q15 Q18 Q20 Q21 Q24 Q29 Q31 Q40 Q51 Q54 Q65 Q70 Q76 Q78 Q84 ReciprocalRank HMM with Implicit Frequency HMM with Explicit Frequency n-gram Language Model 2/10/2017 35 0.00 0.20 0.40 0.60 0.80 1.00 Q2 Q3 Q5 Q8 Q10 Q16 Q22 Q34 Q37 Q46 Q48 Q49 Q50 Q58 Q59 Q63 Q64 Q69 Q85 Q91 Q93 ReciprocalRank HMM with Implicit Frequency HMM with Explicit Frequency n-gram model Queries which do not have a mismatch problem Queries which have a mismatch problem
  • 36. Outline  Introduction  Part 1: Vision Advantages of using Knowledge Graph in  Question Answering  Machine Learning  NLP  Information Retrieval  Part 2: Research in depth  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem  HeadEX: Triple Extraction from Stream of News Headlines on Twitter using n-ary Relations 2/10/2017 36
  • 37. Knowledge Graph Creation HeadEx: Triple Extraction from Stream of News Headlines on Twitter using n-ary Relations 2/10/2017 37
  • 38. Stream of News Headlines 2/10/2017 38
  • 39. CEVO: Cognitive annotation on relations • Problem:  Relation Extraction  Contextual equivalence of relations  Diversity in Conceptualization Requirements:  Relation tagging on textual data  Relation linking  Integration and alignment of properties  Simplicity  Reusability 2/10/2017 39
  • 40. CEVO: Cognitive annotation on relations • CEVO is built up on Levin ‘s categorization on English verbs. • CEVO has an abstract conceptualization • You can find CEVO at http://eventontology.org 2/10/2017 40
  • 41. Background Data Model the meet event is associated with entities with type of Par t i ci pant and Topi c (i.e., topic discussed in themeeting). Considering the sample of tweets in Table ??, the tweets no1, no4, no7 are instances of the event Communi cat i on with the mentions t el l , say, announce. Thetweets no2, no5, no8 areinstances of theevent Meet with thementions meet , vi si t . Thetweets no3, no6, no9 areinstances of theevent Mur der with the mention ki l l . subclass Generic Event Communica3on Meet Publisher Published By subclass xs:date Murder subclass published date Loca3on Time occurredIn occurredon (a) SubClasses of Event Meet Par( cipant Topic about A2ended in (b) Meet Class Communica) on Giver Addressee Message expressed says addressed (c) Communication Class Murder Vic*m cause Killer quan*ty kills killed caused xs:string xs:integer expression (d) Murder Class Fig. 1: Subclasses of theGeneric Event. 2/10/2017 41
  • 42. Example Tweet #2: Instagram CEO meets with @Pontifex to discuss "the power of images to unite people". 1. :Meet#1 a :Meet ; rdfs:label `meets' . 2. :e1 a :Participant ; rdfs:label `Instagram CEO' . 3. :e2 a :Participant ; rdfs:label `@Pontifex' 4. :t1 a :Topic ; :body `to discuss the power of images to unite people' . 5. :e1 :attendedIn Meet#1 . 6. :e2 :attendedIn Meet#1 . 7. :Meet#1 :about :t1 . 8. :Meet#1 :publisher :CNN . 9. :Meet#1 :date `26/2/2106' . 2/10/2017 42
  • 43. Overview Crawling News Tweets Disambigua on & Valida on & URI assignment Filtering Event Recognition Entity Extraction 2/10/2017 43
  • 44. Entity ExtractionusingLinguisticAnalysis 2/10/2017 44 withInstagram CEO @Pon4fex the powerto compound case mark det of images peopleto unitediscuss dobj case nmod mark dobj acl Fig. 2: Dependency tree for the running example. Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the longest sequence of tokens of a given tweet that satisfies the following conditions: (i) There is one token that is (directly) dependent on the root, and (ii) any other token included in a given chunk is dependent on a token already within the given chunk. Moreover, ROOT is an individual chunk. Example 2 (Chunking Tweet). We chunk the running example based on the concept of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for the chunk of root (because root is an individual chunk), any other chunk has only one token that is dependent on the root (only one outgoing arrow to the root) and other tokens inside that chunk co-reference interior tokens (interior arrows). According to this definition, the example tweet contains four individual chunks. For the chunk ‘ Instagram CEO’ , only the token ‘ CEO’ is dependent on the root and the other token ‘ instagram’ is dependent on the interior token ‘ CEO’ . meets Instagram CEO With @Pon4fex nsubj xcomp compound case mark det To discuss the power of images to unite people nmod dobj case nmod mark dobj acl ROOT Chunk 1 Chunk 2 Chunk 4 Chunk 3 Fig. 3: Chunking the running example based on the concept of Root Dependent Chunk. meets withInstagram CEO @Pon4fex the powerto nsubj xcomp compound case mark det of images peopleto unitediscuss nmod dobj case nmod mark dobj acl ROOT Fig. 2: Dependency tree for the running example. Definition 3 (Dependent Chunk of ROOT). Dependent Chunk of ROOT (DCR) is the longest sequence of tokens of a given tweet that satisfies the following conditions: (i) There is one token that is (directly) dependent on the root, and (ii) any other token included in a given chunk is dependent on a token already within the given chunk. Moreover, ROOT is an individual chunk. Example 2 (Chunking Tweet). We chunk the running example based on the concept of ROOT Dependent Chunk (RDC). Figure 3 shows the resulting chunks. Except for the chunk of root (because root is an individual chunk), any other chunk has only one token that is dependent on the root (only one outgoing arrow to the root) and other tokensinside that chunk co-reference interior tokens(interior arrows). According to this definition, theexample tweet contains four individual chunks. For thechunk ‘Instagram
  • 45. The best observed accuracy for Entity Extraction Tasks 2/10/2017 45
  • 46. Entity Extraction Sequence Labeling Using Deep Learning 2/10/2017 46
  • 48. Annotation Evolution Metadata Annota on Linguis c Annota on Interoperability Annota on Cogni ve Annota on PROV Ontology Dublin Core Meta Data OLiA Ontologies Language Annota on Framework (LAF) MEX (Machine Learning) QANARY (Ques on Answering) NLP Interchange Format (NIF) CEVO (Comprehensive Event Ontology) Universal Conceptual Cogni ve Annota on (UCCA) 2/10/2017 48
  • 49. CEVO use case 1: Annotating Text BBC Tweet#1 on 10/3/2016: Obama and Justin Trudeau announce efforts to fight climate change. NYT Tweet#2 14/3/2016: State elections were "difficult day," German Chancellor Angela Merkel says. CEVO:Communication CEVO:Communication 2/10/2017 49
  • 50. CEVOuse case2: Annotating Ontological Properties We use Web Annotation Data Model (WADM) for annotating ontological properties. example:annotation1 a oa:Annotation oa:hasTarget dbo:spouse oa:hasBody cevo:Amalgamate 2/10/2017 50
  • 51. CEVOuse case3: Relation Linking • Example: Rupert Murdoch and Jerry Hall marry. <exam:headline#char=31,35> a nif:String ; nif:beginIndex 31 ; nif:endIndex 35 ; nif:anchorOf "marry" ; nif:oliaCategory Olia:MainVerb . a cevo:Amalgamate . example:annotation3 a oa:Annotation ; oa:hasTarget exam:headline#char=31,35 ; oa:hasBody dbo:spouse . 2/10/2017 51

Notas do Editor

  1. we encounter two issues. First, we need to find a set of IRIs corresponding to each keyword. Second, we have to construct suitable triple patterns based on the anchor points extracted previously so as to retrieve appropriate data. Figure 1 shows an overview of our approach. Our approach firstly retrieves relevant IRIs related to each user-supplied keyword from the underlying knowledge base and secondly injects them to a series of graph pattern templates for constructing formal queries. So as to find these relevant IRIs, the following two steps are carried out
  2. categorization is based on the matter of information which is retrieved from the knowledge base. Finding special characteristics of an instance: Datatype properties which emanate from instances/classes to literals or simple types and also some kinds of object properties state characteristics of an entity and information around them. So, in the simplest case of a query, a user intends to retrieve specific information of an entity such as “Population of Canada” or 7 “Language of Malaysia”. Since this information is explicit, the simple graph patterns IP.P1, IP.P4 and IP.P6 can be used for retrieving this kind of information. Finding similar instances: In this case, the user asks for a list of instances which have a specific characteristic in common. Examples for these type of queries are: ”Germany Island” or ”Countries with English as official language”. A possible graph structure capturing potential answers for this query type is depicted in Figure 4. It shows a set of instances from the same class which have a certain property in common. Graph pattern templates CI.P7, CI.P8, and CP.P14 retrieve this kind of information. Fig. 4. Similar instances with an instance in common. Finding associations between instances: Associations between instances in knowledge bases are defined as a sequence of properties and instances connecting two given instances (cf. Figure 5). Therefore, each association contains a set of instances and object properties connecting them which is the purpose of the user query. As an example, the query Volkswagen Porsche can be used to find associations between the two car makers. The graph pattern templates II.P9 and II.P10 extract these associations.
  3. An observation reports that basically the hyponym (Words representing a specialization of the input word.) relationship leads at deriving a large number of terms whereas their contribution to the vocabulary mismatch task is trivial. In other words, we check the occurrence of each word w by sub-string matching with all literals (L) of the underlying RDF knowledge base. Then, simply if no occurrence is observed, the word w is removed from W .
  4. We have to redefine traditional concepts from IR, 1. One of them is the concept of co-occurrence, IN TIR , two terms are co-occuring when they are appearing in a specific window, paragraph or document 2. The concept of frequency needs be adapted,