Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
2. Further Reading
Digital Enterprise Research Institute www.deri.ie
Hasan S, O’Riain S, Curry E.
Approximate Semantic Matching of Heterogeneous Events. In:
6th ACM International Conference on Distributed Event-Based
Systems (DEBS 2012)
www.edwardcurry.org
3. Outline
Digital Enterprise Research Institute www.deri.ie
n Introduction n Experiments
¨ Smart Environments ¨ Wikipedia
¨ Motivational Scenario ¨ Freebase
¨ Related Work n Conclusions
n Proposal n Q&A
¨ Approximate Semantic
Matching
3 of 34
4. Smart Environments
Digital Enterprise Research Institute www.deri.ie
n Smart Homes, Grids, Cities…
n Internet-of-Things, Sensor Web…
by 2020 50 billion devices connected to mobile networks (OECD, 2012)
n Non-technical users
n High heterogeneity
n Trend for dynamic data-driven decision making
Event/Situation of Interest
Event/Situation of Interest Soccer match played in Berlin
New free parking space near me
........
4 of 34
5. Motivational Scenario- Enterprise
Digital Enterprise Research Institute www.deri.ie
CIO
CSO
Situation of Interest
Company CO2 emissions
performance Energy usage by
global IT
department
Helpdesk
Various terms used:
energy consumption,
energy usage…. PUE of the
Data Center in
room, space, zone…
Dublin
Maintenance Personnel
Dynamic Environments:
New events from kWhs used by
equipments joining and server
leaving 172.16.0.8
Building
Data Center
5 of 34
6. Requirements
Digital Enterprise Research Institute www.deri.ie
n Handling of semantically heterogeneous
events
n Handling of dynamic environments with
event types by sources joining and leaving
n Low cost of rules management
n Usability
n Precision
6 of 34
7. Event Processing
Digital Enterprise Research Institute www.deri.ie
Situation of Interest
When a floor is empty and its energy usage for an hour is above
threshold w.r.t budget then it is an excessive usage
Non-technical users with User
Translation
Developer
natural language needs
CEP Engine Separated from the engine
UI
Rules tied RULE vocabulary
EVENT PROCESSING to
EPL Interface
Rules
Repository
and Parser
Execution
INSERT INTO ExcessiveEnergyUsageByFloor Pattern Matcher Repository
SELECT a.floor as floor case of
High cost in
heterogeneity or change
FROM PATTERN
Single Event Templates
[(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor
Matcher Repository
(a.floor=b.floor))]
.WIN:TIME(1 hour)
GROUP BY a.floor
WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue) ERP
PC NO XDG26359
Floor: 1st
usage: 3 kWh
VM: vmdgsit01.deri.ie
Floor: 1st BMS
usage: 15 kWh
7 of 34
8. Exact Event Processing Paradigm
Digital Enterprise Research Institute www.deri.ie
Requirement Addressing by the paradigm
Semantic Heterogeneity Does not scale out to high
heterogeneous environments
Dynamic Environment Does not scale out to high dynamic
environments
Rule Management High cost on large heterogeneity and
dynamicity
Usability Low
Precision 100% (typically)
8 of 34
9. Decoupling in Event Systems
Digital Enterprise Research Institute www.deri.ie
n Space Producers and consumers don’t know each other
n Time Participants don’t need to be actively involved in the
interaction the same time
n Synchronization Event producers and consumers don’t get
blocked to send/receive events
Space
Time
Event Event
Producer Consumer
Synchronization
9 of 34
10. Decoupling in Event Systems
Digital Enterprise Research Institute www.deri.ie
n Principle
¨ “Removal of explicit dependencies between
participants” (Eugster et al., 2003)
n Outcome
¨ Scalability
Space
Time
Event Event
Producer Consumer
Synchronization
10 of 34
11. Semantic Coupling
Digital Enterprise Research Institute www.deri.ie
n Current event-based systems keep explicit semantic
dependency between participants
n Limited scalability in highly heterogeneous and
dynamic environment
Space
Time
Event Event
Producer Consumer
Synchronization
Semantic
(Event types, property, values)
11 of 34
12. Current Approaches
Digital Enterprise Research Institute www.deri.ie
n Ontology-based
¨ (Petrovic et al., 2003), (Zhang & Ye, 2008)…
¨ Does not “remove explicit dependency”
¨ Hard to achieve ontology agreement a priori at large-scale
of heterogeneity and dynamicism
¨ Medium usability, 100% precision typically
n Fuzzy sets
¨ (Liu & Jacobsen, 2002)
¨ Address only event numerical values vs. string values
subscriptions
¨ Medium usability, High precision
12 of 34
13. Proposed Approach
Digital Enterprise Research Institute www.deri.ie
n Approximate semantic matching of events
Event Types & properties
Type(s) possible mappings
Properties
Values
Subscription Values possible
Type(s) mappings
Properties
Values
Pick best overall
mapping
Post-matching event
processing
13 of 34
14. Background
Digital Enterprise Research Institute www.deri.ie
q Semantic Similarity
q f: Terms X Terms à [0,1]
q term1, term2 are Terms
q f(term1, term2)=0 absolute semantic mismatch
q f(term1,term2)=1 exact match
q E.g. Football Match and Soccer Match are similar
q Relatedness: a general case of similarity
q E.g. Football Match and Referee related but not similar
q Thesaurus-based: e.g. WordNet-based
q Distributional semantics-based: e.g. Wikipedia ESA
q The more Wikipedia articles two terms occurs in, the more
related they are
14 of 34
15. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Football Match Types & properties
possible mappings
2010 FIFA World
Howard Webb
type Cup Final
referee name Values possible
mappings
Spain National event
team
Football Team
team Pick best overall
location Netherlands National mapping
location Football Team
Johannesburg
Post-matching event
FNB stadium processing
Subscription
Event type “”Soccer Match
Event team “Spain”
Event place “South Africa”
15 of 34
16. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
1
0.9
Lin
0.8
Post-matching event
0.7
Jiang&Conrath
processing
Precision
0.6
0.5
Leacock&Chodorow
0.4
Lesk
0.3
0.2
Path
0.1
0
Resnik
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Gloss
Vector
Recall
16 of 34
17. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Determine top m correspondence candidates Post-matching event
RankSimJiiang&Conrath(ps, pe) processing
Measure properties relatedness
fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))
17 of 34
18. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
type type Top 1
location 90% place Post-matching event
team team processing
type type Top 2
name 40% place
referee team
18 of 34
19. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
Football Match
Howard Webb
Soccer Match
Spain National Football Team South Africa Values possible
Johannesburg
FNB stadium
Spain mappings
Netherlands National Football Team
Pick best overall
mapping
Measure values relatedness fV=WikipediaESA(Vs, Ve)
Post-matching event
processing
19 of 34
20. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
Football Match
Howard Webb
Soccer Match
Spain National Football Team South Africa Values possible
Johannesburg
FNB stadium
Spain mappings
Netherlands National Football Team
Pick best overall
mapping
Spain National 95% Spain
Football Team Post-matching event
processing
Netherlands National 30% Spain
Football Team
20 of 34
21. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Football Match
Howard Webb
Soccer Match
Spain National Football Team South Africa Post-matching event
Johannesburg
FNB stadium
Spain processing
Netherlands National Football Team
Calculate statements relatedness
fSTMT =fP(ps, pe)*fV(vs, ve)
21 of 34
22. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Event Subscription Types & properties
possible mappings
type type
name place Values possible
referee team mappings
team
location Pick best overall
mapping
Football Match
Howard Webb
Soccer Match
Spain National Football Team South Africa Post-matching event
Johannesburg
FNB stadium
Spain processing
Netherlands National Football Team
Determine correspondent event statement
Corre by Max fSTMT
22 of 34
23. Proposed Approach Instantiation
Digital Enterprise Research Institute www.deri.ie
Types & properties
n Rank within a window possible mappings
n Complex Event Processing
Values possible
n … mappings
Pick best overall
mapping
Post-matching event
processing
23 of 34
24. Experiments Overview
Digital Enterprise Research Institute www.deri.ie
n Methodology
¨ Prepare an event set that reflect required semantic
heterogeneity (Wikipedia events)
¨ Prepare gold standard set of subscriptions that stress
multiple aspects of semantic coupling
¨ Validate suitability of semantic approximation from
precision perspective
¨ Use a different event set and same subscriptions to
validate low maintainability cost (Freebase events)
n Evaluation Criteria
¨ Average interpolated Precision-Recall Curve on 11 recall
points
¨ Maximal F1 Score over the average curve
24 of 34
25. Experiment 1- Wikipedia Events
Digital Enterprise Research Institute www.deri.ie
Event Set Statistics
Source structured Wikipedia Infoboxes,
DBpedia 31 August 2011
Collection Triples directly associated to instances
of dbpedia-owl:Event class
Data model RDF
Total # of events 20,156
Total # of distinct event types 4,950
Total # of distinct event properties 1,459
Total # of distinct event values 500,717
Total # of triples 1,502,599
Average # of distinct type per event 7.42
Average # of distinct property per event 30.52
Average # of distinct value per event 54.16
Average # of triple per event 64.67
25 of 34
26. Experiment 1- Wikipedia Events
Digital Enterprise Research Institute www.deri.ie
n Example Event Types
¨ Football Match
¨ Race
¨ Music Festival
¨ Space Mission
¨ Election
¨ 10th-Century BC Conflicts
¨ Academic Conference
¨ Aviation Accident
¨ …
26 of 34
27. Experiment 1- Subscription Set
Digital Enterprise Research Institute www.deri.ie
n Manually created gold standard set of subscriptions
ID Description Subscription # of # of Event type Event Literals and
relevant needed approximation properties resources
events exact approximation approximation
rules
1 Football matches event type "Football Match" 1 1 NO NO NO
played by Spain in event team "Spain national football
the FNB stadium team"
event stadium "FNB Stadium"
2 Football matches event type "Football Match" 2 2 NO YES NO
played in the FNB event place "FNB Stadium"
stadium
3 Events taking place in event type "Event" 219 5 NO YES Syntactic
Wembley stadium event place "Wembley Stadium"
4 Charity events taking event type "Charity" 29 6 YES YES Semantic
place in Wembley event place "Wembley Stadium" + Syntactic
stadium
5 Charity Rock events event type "Charity" 2 2 YES YES Semantic
taking place in event type "Rock" + Syntactic
Wembley stadium event place "Wembley Stadium"
6 Football matches event type "Football Match" 505 603 NO YES Background
played in the UK event stadium "United Kingdom" Knowledge
7 Football matches event type "Football Match" 20 123,774 NO YES Background
played by a South event team "South America" Knowledge
American team in event stadium "Europe"
Europe
27 of 34
28. Experiment 1- Subscription Set
Digital Enterprise Research Institute www.deri.ie
approximation
approximation
approximation
n Manually created gold standard set of subscriptions
# of relevant
Subscription
# of needed
Literals and
Description
exact rules
Event type
ID Description Template # of # of Event type Event Literals and
properties
resources
relevant needed approximation properties resources
events exact approximation approximation
events
rules
Event
1 Football matches event type "Football Match" 1 1 NO NO NO
ID
played by Spain in event team "Spain national football
the FNB stadium team"
3 Events taking event type
event stadium "FNB Stadium"
219 5 NO YES Syntactic
place in "Event"
2 Football matches event type "Football Match" 2 2 NO YES NO
played in the FNB event place "FNB Stadium"
Wembley
stadium event place
3
stadium
Events taking place in
Wembley stadium
"Wembley
event type "Event"
event place "Wembley Stadium"
219 5 NO YES Syntactic
4 Charity events
Stadium"
event type "Charity" 29 6 YES YES Semantic
taking place in event place "Wembley Stadium" + Syntactic
Wembley stadium event type "Event"
Subscription events
5 Charity Rock event place "Wembley Stadium"
event type "Charity" 2 2 YES YES Semantic
taking place in event type "Rock" + Syntactic
Wembley stadium ?event rdf:type dbpedia-owl:Event.
event place "Wembley Stadium"
SPARQL pattern 1
6 Football matches ?event dbpprop:stadium
event type "Football Match" 505 dbpedia:Wembley_Stadium.
603 NO YES Background
played in the UK event stadium "United Kingdom" Knowledge
?event rdf:type dbpedia-owl:Event.
SPARQL pattern 2
7 Football matches event type "Football Match" 20 123,774 NO YES Background
played by a South ?event dbpedia-owl:location
event team "South America" dbpedia:Wembley_Stadium.
Knowledge
American team in event stadium "Europe"
… Europe …
28 of 34
29. Experiment 1- Results
Digital Enterprise Research Institute www.deri.ie
1
0.9
0.8
0.7
Precision
0.6
0.5
Events taking place in Wembley stadium
0.4
0.3
Need for a hybrid matcher that
0.2
0.1
combines both
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
45%
Jiang&Conrath
40% Wikipedia
ESA
35%
Frequency
30%
25% 1
20% 0.9
15% 0.8
10% 0.7
Precision
5%
0.6
0.5
Football matches played in the UK
0%
0.4
0 2^ -25 2^ -20 2^ -15 2^ -10
0.3
2^ -5 1
0.2
Semantic similarity or relatedness score
0.1
(log scale) 0
Jiang&Conrath WikipediaESA 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.1
1
Recall
Jiang&Conrath
Wikipedia
ESA
29 of 34
30. Experiment 1- Results
Digital Enterprise Research Institute www.deri.ie
n Hybrid matcher outperforms a single similarity or
relatedness measure matcher.
Matcher Jiang&Conrath Wikipedia ESA Hybrid
Maximal F1 Score 70.06% 44.26% 75.45%
Recall 80% 80% 90%
Precision 62.31% 30.59% 64.94%
1
0.9
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Jiang&Conrath
Wikipedia
ESA
Hybrid
30 of 34
31. Experiment 2- Freebase Event Set
Digital Enterprise Research Institute www.deri.ie
Event Set Statistics
Source Freebase events dump 1 December
2011, triples current
Collection Triples directly associated to instances
of “fbase:time.event" class
Data model RDF
Total # of events 84,529
Total # of distinct event types 858
Total # of distinct event properties 1,242
Total # of distinct event values 1,199,627
Total # of triples 1,859,338
Average # of distinct type per event 3.33
Average # of distinct property per event 10.67
Average # of distinct value per event 21.66
Average # of triple per event 21.99
31 of 34
32. Experiment 2- Subscription Set
Digital Enterprise Research Institute www.deri.ie
n Same as in Experiment 1.
ID Description Subscription # of # of Event type Event Literals and
relevant needed approximation properties resources
events exact approximation approximation
rules
1 Football matches event type "Football Match" 1 1 YES YES NO
played by Spain in event team "Spain national
the FNB stadium football team"
event stadium "FNB Stadium"
2 Football matches event type "Football Match" 8 2 YES YES NO
played in the FNB event place "FNB Stadium"
stadium
3 Events taking place in event type "Event" 29 5 NO YES NO
Wembley stadium event place "Wembley Stadium"
4 Charity events taking event type "Charity" 0 - - - -
place in Wembley event place "Wembley Stadium"
stadium
5 Charity Rock events event type "Charity" 0 - - - -
taking place in event type "Rock"
Wembley stadium event place "Wembley Stadium"
6 Football matches event type "Football Match" 34 1,398 YES YES Background
played in the UK event stadium "United Kingdom" Knowledge
7 Football matches event type "Football Match" 2 219,600 YES YES Background
played by a South event team "South America" Knowledge
American team in event stadium "Europe"
Europe
32 of 34
33. Experiment 2- Results
Digital Enterprise Research Institute www.deri.ie
n Hybrid matcher gives similar results in Freebase as
in DBpedia
Matcher Jiang&Conrath Wikipedia ESA Hybrid
Maximal F1 Score 44.60% 70.73% 76.33%
Recall 60% 80% 80%
Precision 35.49% 63.39% 72.98%
1
0.9
0.8
0.7
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Jiang&Conrath
Wikipedia
ESA
Hybrid
33 of 34
34. Conclusions
Digital Enterprise Research Institute www.deri.ie
n Approximate semantic matcher addresses
subscriptions/ rules maintainability cost in
heterogeneous and dynamic environments
n Approximate semantic matcher is suitable when less
than 100% precision is acceptable
Approximate Semantic
Exact Matcher
Matcher
Number of Required Subscriptions 345,000 7
Maximal F1-Score 100% 75.89%
n A hybrid matcher outperforms a single similarity or
relatedness measure matcher.
34 of 34
35. Future Work
Digital Enterprise Research Institute www.deri.ie
n Need to enhance subscription set for more
representativeness.
n Approximate semantic matcher generates “uncertain”
results whose impacts on further event processing
functions such as CEP needs to be studied
35 of 34