SlideShare uma empresa Scribd logo
1 de 72
Dancing with
publish/subscribe
(Distributed event based systems)
Lightening Talk on Top-k publish/subscribe
By Y.S. Horawalavithana
BSc(Hons.) Computer Science
MSc. Distributed System 1
Today
For who?
Outline
Discussion
MSc. Distributed System 2
Communication paradigms
Point-to-point communication
• Participants need to exist at the
same time
• Direct coupling
• Strict Identity management
• Not good for volatile
environment
• Not a good way to communicate
with several participants
Indirect communication
• Communication through an
intermediary between sender(s)
& receiver(s)
• No direct coupling
• Space uncoupling
• Anonymity
• Time uncoupling
• Independent lifetimes
• Through persistent communication
channel
MSc. Distributed System 3
Indirect communication
• Scenarios where users connect and disconnect very often
• Mobile environments, messaging services, forums
• Event dissemination where receivers may be unknown and change
often
• RSS, events feeds in financial services
• Scenarios with very large number of participants
• Google Ads system, Spotify
• Commonly used in cases when change is anticipated
• Need to provide dependable services
MSc. Distributed System 4
Taxonomy
MSc. Distributed System 5
Indirect
Communication
Communication
based
Group
communication
Message Queues
Publish/subscribe
State based
Tuple spaces
Distributed Shared
Memory
Publish/Subscribe
‘’ Notify me of all stock quotes of Google from
NYSE if the price is greater than 150 ’’
MSc. Distributed System 6
Introduction: pub/sub systems
• Information consumers express their interests in information with
subscriptions, identifying which items are of interest.
• Information producers, publish information by submitting
publications (a.k.a. publication events or event notifications).
• A pub/sub system:
• Subscription processing: Indexing and storing subscriptions.
• Event processing: upon event arrival, access subscription indices and identify
all matched subscriptions.
• Event delivery: deliver event to clients with matched subscriptions..
MSc. Distributed System 7
Programming model
MSc. Distributed System
8
Figure adapted from Instructor’s Guide for Coulouris, Dollimore, Kindbergand Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012
Introduction: DB view at pub/sub
• Events correspond to data (“data-carrying events”).
• Subscriptions correspond to continuous queries:
• Define predicates on attributes
• Fundamentally different model:
• Instead of storing/indexing data and issuing queries to access it
• Queries (subscriptions) are stored/indexed and incoming data (events) is
matched against stored queries.
MSc. Distributed System 9
Introduction: Communications view at
pub/sub
• Akin to multicasting (group IPC, 1-N communication)
• Each publisher (through its events) communicates to a large number of subscribers.
• However, communication is,
• Anonymous
• Subscribers do not “know” publishers and vice versa
• Asynchronous
• publishers and subscribers do not block when publishing/subscribing
• Mutually out-of-sync: no rendezvous in time
• Heterogeneous
• can be used to connect heterogeneous components
MSc. Distributed System 10
Example: Real-world Implementation
MSc. Distributed System 11
Pub/sub: System Space
12
Figure adapted from K. Pripuºi, I. Podnaršarko, and K. Aberer, Top-k/w publish/subscribe 2012
Pub/sub: Subscription models
Content based
Type based
Topic based
• Context Type
• Object Types
• Independent Channels
• Hierarchical Topics
MSc. Distributed System 13
• (Un)structured queries
• Complex Event Processing
Pub/sub: Real-world Applications
• Too numerous…some representative application classes
• News alerts
• Online stock quotes
• Internet games
• Sensor networks
• Location-based services
• Network management
• Internet auctions
• …….
MSc. Distributed System 14
Case study: Dealing Room
MSc. Distributed System 15
Case study: Spotify
MSc. Distributed System 16
Spotify at First glance…
• End-to-end architecture to support social interaction
• Topic-based subscriptions
• Friends (Spotify + Facebook): FB friends who are Spotify users and by sharing
music
• Playlists (URI): other users playlists (updates), “Collaborative” playlists or only
modifiable by creator
• Artists pages (follow artist): new albums or news related to artist
MSc. Distributed System 17
Spotify at First glance…
• Hybrid engine
• Relay events to online users in real
time
• Store and forward selected events to
offline users
• DHT based overlay
• 3 sites: Stockholm Sweden, London
UK, Ashburn USA
• Design to scale
• Stores approx., 600 million
subscriptions at any given time
• Matches billions of publication events
every day
MSc. Distributed System 18
Large scale publish/subscribe systems
MSc. Distributed System 19
Boolean matching at pub/sub
• Assume the dealer room system implemented on top of pub/sub
paradigm
• Dealer submits a subscription
• [Name = ‘Google’ , price > 150 , volume < 5000]
• Stock Exchange publishes a stock quote (publication)
• [Name = ‘Google’ and price = 200 and volume = 3000]
MSc. Distributed System 20
Drawbacks at Boolean pub/sub
Drawbacks
A subscriber may be either
overloaded with publications or
receive too few publications
Impossible to compare different
matching publications as
ranking functions are not
defined, and
Partial matching between
subscriptions and publications
is not supported.
MSc. Distributed System 21
Real-world Requirements: Sensor Web
• Real-time environmental monitoring
• Environmental scientists would like to identify and monitor up to 10 sites with
the largest pollution readings over the course of a single day - NSF's Ocean
Observatories Initiative (OOI)
• Identify 10 sensors closest to a particular location measuring the largest
pollution levels over time (e.g. top-10 readings are provided on hourly basis) -
SNSF’s Sensor Scope project
• Power grid monitoring
• Operators would like to monitor over time 100 sites with the largest or the
lowest power production using solar panel current and voltage readings so
that they to identify power grid hot-spots
MSc. Distributed System 22
Real-world Requirements: Forest Fire rescue
MSc. Distributed System 23
Real-world Requirements: Social Media
• Personalized newspaper
• Facebook user is approximately exposed to more than 1500 stories per day,
but an average user only engaged with 100 stories from the current news
feed.
• What if to have a personalized news-paper at the end of day
• Social Annotation of news-stories
• Serving of Yahoo! News page-views with a fresh set of Top-k tweets, by
considering news-story as a subscription while tweets as incoming
publications
MSc. Distributed System 24
Top-k publish/subscribe
‘’ Notify me of all Top-10 stock quotes of Google hourly from NYSE if
the price is greater than 150 ’’
MSc. Distributed System 25
Top-k publish/subscribe
• How many matching publications will be delivered to a subscriber
during a period of time?
• Actually we don’t know in state-of-the-art pub/sub systems
• Top-k pub/sub models are powered by,
• Expressive stateful query processing engines
• User defined parameter k restricts the delivered publications
• Time (in)dependent Top-k computing methods
• Sliding window model for handling streaming publications
• Methods to deliver Top-k notifications
• Pro-active
• On-demand
MSc. Distributed System 26
Abstract Top-k/w matching
• Limit the number of matching and delivered publications to k best within a sliding
window of size w
MSc. Distributed System 27
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃5 𝑃1
𝑃5 𝑃6
𝑃5 𝑃9
Top-2
Matching publication stream
h=1
h=3
Jumping
step
(h)
[Pripužić 2012] Top-k/w model: DaZaLaPS
• Subscriber controls the number of publications it receives per
subscription (top-k) within a sliding window
• Subscription is defined by
• Totally-ordered and time-independent scoring function
• Parameter k ∈N
• Parameter w ∈R+*(time-based)or n ∈ N (count-based sliding window).
• Ranks publications according to the degree of relevance (score) to a
subscription
• Each publication is competing with other publications from the sliding
window for a position among top-k publications
MSc. Distributed System 28
[Pripužić 2012] Top-k/w model: DaZaLaPS
• When can a publication become a Top-k object in the subscription
window?
• Immediately upon publication
• Later on when it becomes a Top-k object in the subscription window
MSc. Distributed System 29
• Maintain a set of candidate
(potential Top-k) publications
in memory!
[Pripužić 2012] Distributed Top-k/w model
• Network of processing nodes, where each node is responsible for
computing Top-k/w publications
• Publication Flooding
MSc. Distributed System 30
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
p
p
p
[Pripužić 2012] Distributed Top-k/w model
• Subscription Flooding
• Proxy subscriptions:
• Replicas of original publications which to be advertised
over the network
MSc. Distributed System 31
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
[Pripužić 2012] Distributed Top-k/w model
• Rendezvous routing
• Often implemented on top of a structured peer-to-peer network
• Rendezvous node is responsible for
• Matching mapped publications & subscriptions
• Delivering matching publications to subscribers directly
MSc. Distributed System 32
A
B
C D
E
F
subscribe(s)
publish(p)
s
s
sp
p
change(𝑡ℎ 𝑠)
[Pripužić 2012] Distributed Top-k/w model
• Basic gossiping
• Similar to publication flooding, but randomly spread through an overlay
network as a gossip
• Cannot provide any guarantee regarding publication delivery
• Purely probabilistic
MSc. Distributed System 33
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
p
[Pripužić 2012] Distributed Top-k/w model
• Informed gossiping
• Each node additionally stores subscriptions of its close neighbors and also
processes the subscriptions of its neighbors
• Partially probabilistic and partially deterministic
MSc. Distributed System 34
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
𝑠 𝑝
𝑡ℎ 𝑠
p
[Shrarer 2014] Google Top-k pub/sub
MSc. Distributed System 35News-story as a subscription Tweets as publications
[Shrarer 2014] Google Top-k pub/sub
• Annotating news stories with social updates (tweets), at a news website
serving high volume of page-views
• Billions page-views at Yahoo News! per day
• More than 100 millions related tweets per day
• Top-k pub/sub approach
• stories are standing subscriptions on tweets
• Story Index is queried frequently,
• but it is updated infrequently
• based on DAAT, TAAT algorithms
• Tweet Index updated frequently
• but queried only for new stories
MSc. Distributed System 36
[Drosou 2009] PrefSIENA
• Say Addison is more interested in horror movies than comedies
• Addison would like to receive notifications about (various) comedies only if
there are no (or just a few) notifications about horror movies
MSc. Distributed System 37
title = The Godfather
genre = drama
showing time = 21:10 title = Ratatouille
genre = comedy
showing time = 21:15
title = Fight Club
genre = drama
showing time = 23:00
title = Casablanca
genre = drama
showing time = 23:10title = Vertigo
genre = drama
showing time = 23:20
Published events User subscriptions
genre = drama
genre = horror
[Drosou 2009] PrefSIENA
• To express some form of ranking among subscriptions, PrefSIENA
allow users to define priorities among them
• To do this, they introduce preferential subscriptions
• Based on preferential subscriptions, we deliver to users only the k most
interesting events
• Covering/Matching relation
MSc. Distributed System 38
string director = Peter Jackson
time release date > 1 Jan 2003
string director = Steven Spielberg
string genre = fantasy
string release date > 1 Jan 2003
string title = LOTR: The Return of the King
string director = Peter Jackson
time release date = 1 Dec 2003
string genre = fantasy
integer oscars = 11


[Drosou 2009] PrefSIENA
• Ordering subscriptions
• To order user subscriptions according to the preference relation, they use
the winnow operator1, applying it on various levels
• Step 01: Construct DAG
MSc. Distributed System 39
genre = drama genre = horror
User preferences
genre = comedy genre = romance
genre = romance genre = action
≻genre = drama genre = horror
≻genre = comedy genre = romance
≻genre = romance genre = action
≻genre = comedy genre = horror
genre = drama genre = comedy
genre = horror genre = romance
genre = action
Preference graph
[Drosou 2009] PrefSIENA
• Step 02: perform a topological sort to compute winnow levels. The
subscriptions of level i are associated with a preference rank 𝒢(i):
• 𝒢 is a monotonically decreasing function with 𝒢 → [0, 1]
• e.g. for 𝒢 = (D +1 – (l -1)) / (D +1)
MSc. Distributed System 40
genre = drama genre = comedy
genre = horror genre = romance
genre = action
Preference graph
Preference rank = 1
Preference rank = 2/3
Preference rank = 1/3
[Drosou 2009] PrefSIENA
• Step 03: Computing Event Ranks
• Step 04: Based on the ranks, they deliver to users only the k most
interesting events
• Continuous, periodic & sliding window
MSc. Distributed System 41
User subscriptions
genre = adventure 0.9
director = Peter Jackson 0.7
string title = King Kong
string director = Peter Jackson
time release date = 14 Dec 2005
string genre = adventure
string title = King Kong
string director = Peter Jackson
time release date = 14 Dec 2005
string genre = adventure
0.9
ℱ = max
[Drosou 2009] PrefSIENA: Sliding window
Delivery
MSc. Distributed System 42
title = The Big Parade
genre = romance
showing time = 21:00
title = The Apartment
genre = comedy
showing time = 21:10
title = The Godfather
genre = drama
showing time = 21:25
title = Forrest Gump
genre = romance
showing time = 21:10
title = Jaws
genre = horror
showing time = 20:55
title = Vertigo
genre = horror
showing time = 21:45
title = Psycho
genre = horror
showing time = 21:50
title = Pulp Fiction
genre = drama
showing time = 21:25
User subscriptions
genre = comedy 0.9
genre = romance 0.9
genre = drama 0.8
genre = horror 0.6
20:00
20:15
20:22
20:25
20:50
20:40
20:45
20:55
k = 2
w = 4
title = The Big Parade
genre = romance
showing time = 21:00
title = The Apartment
genre = comedy
showing time = 21:10
title = Forrest Gump
genre = romance
showing time = 21:10
title = The Godfather
genre = drama
showing time = 21:25
title = Psycho
genre = horror
showing time = 21:50
title = Pulp Fiction
genre = drama
showing time = 21:25
Matching events Delivered events
[Drosou 2009] PrefSIENA But wait..
• The most highly ranked events may be very similar to each other…
• We wish to retrieve results on a broader variety of user interests
• Two different perspectives on achieving diversity:
• Avoid overlap: choose notifications that are dissimilar to each other
• Increase coverage: choose notifications that cover as many user interests as possible
• How to measure diversity?
• Many alternative ways
• Common ground: measure similarity/distance among the selected items
MSc. Distributed System 43
MSc. Distributed System 44
Diversity: Top-k representative set
Representative Top-kDrawback
(without diversity)
What we want
(with diversity)
Method to retrieve Top-k publications from matching publications
MSc. Distributed System 45
MAX* k-diversity problem
where
1. P = {p1, …, pn}
2. k ≤ n
3. d: a distance metric
4. f: a diversity function
),(argmax*
dSfS
k|S|
PS



Find:
MSc. Distributed System 46
Proposed: MAXDIVREL k-diversity problem
 
  S-Pinrelevancy&similarity-distheminimize,,
Sinrelevancy&similarity-disthemaximize,,g
),,(
),,(
maxarg),,(argmax*




rdSh
rdS
rdSh
rdSg
rdSfS
PS
where
1. P = {p1, …, pn}
2. d: a distance metric
3. r: a relevance metric
4. f: a diversity function
MSc. Distributed System 47
Formal Definition: MAXDIVREL k-diversity
 
  






SPpSp
ji
i
j
Spp
ji
i
j
ji
ji
ppd
pr
pr
SP
rdSh
ppd
pr
pr
S
rdS
,
,
dominanceholds),(
)(
)(
||
1
,,argmin
ceindependenholds),(
)(
)(
||
1
,,gargmax
where
1. P = {p1, …, pn}
2. d: a distance metric
3. r: a relevance metric
4. 𝛼 > 0
Independence condition:
∀𝑝𝑖, 𝑝𝑗 ∈ 𝑆, 𝑑 𝑝𝑖, 𝑝𝑗 > 𝛼
Dominance condition:
∀𝑝𝑖 ∈ 𝑃, ∃𝑝𝑗 ∈ 𝑆 𝑠. 𝑡. 𝑑 𝑝𝑖, 𝑝𝑗 ≤ 𝛼; 𝑖 ≠ 𝑗
MSc. Distributed System 48
NP-Hardness:
Minimum independent-dominating set
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2
𝛼
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2

𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
  jijiji ppppdppodNeighborho  ,|)(
𝑣1
𝑣4
𝑣3𝑣2
𝑣5
Publication
space
Graph
model
Independent, dominating Independent, dominating Independent, dominating Dominating, not independent
MSc. Distributed System 49
NAÏVE Greedy argmax
𝑟(𝑝𝑖)2
𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗)
MSc. Distributed System 50
Handling streaming publications
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝛼
𝑝6
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝑣6
Continuity Requirements
1. Durability
an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window
if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ
window are failed to compete with it.
2. Order
Publication stream follow the chronological order
We avoid the selection of item j as diverse later, when we already selected an item i which is not-
older than j.
MSc. Distributed System 51
MAXDIVREL continuous k-diversity
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
Matching publication stream
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
ith window
(i+1)th window
𝑆𝑖
∗
𝑆𝑖+1
∗
MAXDIVREL k-diversity
MAXDIVREL k-diversity
Independence
Dominance
Durability
Order
 Straightforward solution:
 Apply naïve greedy method at each instance
 Propose incremental index mechanism!
 Avoid the curse of re-calculating neighborhood
MSc. Distributed System 52
Locality Sensitive Hashing (LSH)
 Simple Idea
 if two points are close together, then after a “projection” operation these two
points will remain close together
MSc. Distributed System 53
LSH Analysis
 For any given points 𝑝, 𝑞 ∈ 𝑅 𝑑
𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≥ 𝑃1 𝑓𝑜𝑟 𝑝 − 𝑞 ≤ 𝑑1
𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≤ 𝑃2 𝑓𝑜𝑟 𝑝 − 𝑞 ≥ 𝑐𝑑1 = 𝑑2
• Hash function h is (𝑑1, 𝑑2, 𝑃1, 𝑃2) sensitive,
• Ideally we need
• (𝑃1−𝑃2) to be large
• (𝑑1−𝑑2) to be small
MSc. Distributed System 54
LSH in MAXDIVREL:
Publications as categorical data
MSc. Distributed System 55
LSH in MAXDIVREL:
Characteristic Matrix
MSc. Distributed System 56
LSH in MAXDIVREL:
Minhashing
 No Publications any more!
 Signature to represent
 Technique
 Randomly permute the rows at
characteristic matrix m times
 Take the number of the 1st row, in
the permuted order,
 which the column has a 1 for
the correspondent column of
publications.
First permutation of rows at characteristic matrix
 Advantage:
 Reduce the dimensions into a small
minhash signature
MSc. Distributed System 57
LSH in MAXDIVREL:
Signature Matrix
Fast-minhashing
Select m number of random hash
functions
To model the effect of m number of
random permutation
Mathematically proved only when,
The number of rows is a prime.
MSc. Distributed System 58
LSH in MAXDIVREL:
LSH Buckets
 Take r sized
signature vectors
 From m sized
minhash-
signature
 Map them into,
 L Hash-Tables
 Each with
arbitrary b
number of
buckets
MSc. Distributed System 59
LSH in MAXDIVREL:
How to select L, r?
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
1. 𝐿 × 𝑟 = 𝑚
2. ?
2) 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑠) ≈
1
𝐿
1
𝑟
MSc. Distributed System 60
LSH in MAXDIVREL:
Analysis
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
 For publications x & y
𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦
 At a particular hash table
 x & y map into the same bucket:
𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 x & y does not map into the same bucket:
1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 At L Hash-tables
 x & y does not map into the same bucket:
(1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿
True near neighbors will
be unlikely to be unlucky
in all the projections
MSc. Distributed System 61
LSH in MAXDIVREL:
Batch-wise Top-k computation
 Bucket “Winner” – a publication which has the
highest relevancy score
 Winner is dominant to represent it's bucket
neighborhood
 Top-k "winners“ that have a majority of votes
 k winners are independent
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
MSc. Distributed System 62
LSH in MAXDIVREL:
Incremental Top-k computation
𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟
Characteristic
Matrix
𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒
Signature
Matrix
Map 𝑖 𝑡ℎ
signature
into L hash-tables
Update “Winner” at
bucket 𝑖 𝑡ℎ
signature
maps into
Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
MSc. Distributed System 63
LSH in MAXDIVREL:
When new publication F arrives…
 Only buckets 𝐵13
, 𝐵23
, 𝐵32
, 𝐵43
will vote
 Follow continuity requirements
 Durability
 Order
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
(i+1)th
window

MSc. Distributed System 64
Implementation
MSc. Distributed System 65
Cloud service modules
Source: Amazon Kinesis Source: Amazon Elastic-cache
MSc. Distributed System 66
Top-k pub/sub: DEMO
P2P Pub/Sub
• Scribe: topic-based, built on top of Pastry, stateful, rendezvous.
• Hermes: topic & content-based, built on top of Pastry(-like) net, stateful,
rendezvous & flooding-like.
• Meghdoot: content-based, built on top of CAN, stateful, rendezvous.
• Tera: topic-based, built on unstructured P2P net, stateful, random walk-
based-flooding.
• Sub2Sub: content-based, built on unstructured P2P net, stateful, flooding-
like.
• DHTStrings: content-based, DHT-independent, string support, stateless,
rendezvous.
• OP-DHT Pub/Sub: content-based, (can be) built on top of
Chord/Pastry/Bamboo.
MSc. Distributed System 67
DHT based pub/sub: Scribe
• Topic Based
• Based on DHT (Pastry)
• Rendezvous event routing
• A random identifier is assigned to each topic
• The pastry node with the identifier closest to the one of the topic
becomes responsible for that topic
MSc. Distributed System 68
DHT based pub/sub: Meghdoot
• Content Based
• Based on Structured Overlay CAN
• Mapping the subscription language and the event space to CAN
space
• Subscription and event Routing exploit CAN routing algorithms
MSc. Distributed System 69
Top-k publish/subscribe at P2P
• Stateful approaches introduce some kind of state at (intermediate) nodes.
State can refer to :
• State needed to support specialized structures built on top of the network structure
• E.g. trees (parent, children pointers)
• Routing state – for ‘content-based routing’:
• Subscription paths to be followed by matching publications
• Subscriptions (meta)data: not just forward pointers to be followed and subscription
content (its predicates), but also possible info as to
• What about query inherent diversification?
• The controlled parameters (k & w) can change
• Updates and the need to maintain state consistency may stress the
system and revoke any benefits..
• So we’ll be left with the complexity …
MSc. Distributed System 70
Future work
• Apply Top-k diversification modules at (un)structured P2P
• Exploiting overlap among diversified results of users who have similar interest
• Develop LSH based index over multi-threaded distributed
environment
• Develop large scale Top-k pub/sub applications by exploring other
suitable use-cases E.g.
• Personalized newspaper for every Facebook user
• Diverse set of personalized Twitter trends
• Social annotation of news-stories
MSc. Distributed System 71
Thank you!
sam2010ucsc@acm.org
@SamTube405
http://geektube405.wordpress.com
MSc. Distributed System 72

Mais conteúdo relacionado

Semelhante a Dancing with publish/subscribe

Information sharing pipeline
Information sharing pipelineInformation sharing pipeline
Information sharing pipelineVioleta Ilik
 
1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed SystemsDaminda Herath
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintrothomasrconnor
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
Project Transfer: Five Years Later
Project Transfer:  Five Years LaterProject Transfer:  Five Years Later
Project Transfer: Five Years LaterJennifer Bazeley
 
InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009Sean Dawson
 
Linkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slidesLinkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slidesruslansv
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes Abdul Basit Munda
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
Apache kafka- Onkar Kadam
Apache kafka- Onkar KadamApache kafka- Onkar Kadam
Apache kafka- Onkar KadamOnkar Kadam
 
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...Emmanuel E C
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...Joshwa Philip
 
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible EnterpriseVoxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible EnterpriseVoxxed Athens
 
12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”DuraSpace
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728Michael Levine-Clark
 
High throughput data streaming in Azure
High throughput data streaming in AzureHigh throughput data streaming in Azure
High throughput data streaming in AzureAlexander Laysha
 

Semelhante a Dancing with publish/subscribe (20)

Information sharing pipeline
Information sharing pipelineInformation sharing pipeline
Information sharing pipeline
 
1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed Systems
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Project Transfer: Five Years Later
Project Transfer:  Five Years LaterProject Transfer:  Five Years Later
Project Transfer: Five Years Later
 
InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009
 
Linkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slidesLinkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slides
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
Apache kafka- Onkar Kadam
Apache kafka- Onkar KadamApache kafka- Onkar Kadam
Apache kafka- Onkar Kadam
 
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...
Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
 
Introduction
IntroductionIntroduction
Introduction
 
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible EnterpriseVoxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
Voxxed Athens 2018 - Eventing, Serverless, and the Extensible Enterprise
 
12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
High throughput data streaming in Azure
High throughput data streaming in AzureHigh throughput data streaming in Azure
High throughput data streaming in Azure
 

Mais de Sameera Horawalavithana

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationSameera Horawalavithana
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political CrisisSameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubSameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

Mais de Sameera Horawalavithana (16)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Último

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Último (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Dancing with publish/subscribe

  • 1. Dancing with publish/subscribe (Distributed event based systems) Lightening Talk on Top-k publish/subscribe By Y.S. Horawalavithana BSc(Hons.) Computer Science MSc. Distributed System 1
  • 3. Communication paradigms Point-to-point communication • Participants need to exist at the same time • Direct coupling • Strict Identity management • Not good for volatile environment • Not a good way to communicate with several participants Indirect communication • Communication through an intermediary between sender(s) & receiver(s) • No direct coupling • Space uncoupling • Anonymity • Time uncoupling • Independent lifetimes • Through persistent communication channel MSc. Distributed System 3
  • 4. Indirect communication • Scenarios where users connect and disconnect very often • Mobile environments, messaging services, forums • Event dissemination where receivers may be unknown and change often • RSS, events feeds in financial services • Scenarios with very large number of participants • Google Ads system, Spotify • Commonly used in cases when change is anticipated • Need to provide dependable services MSc. Distributed System 4
  • 5. Taxonomy MSc. Distributed System 5 Indirect Communication Communication based Group communication Message Queues Publish/subscribe State based Tuple spaces Distributed Shared Memory
  • 6. Publish/Subscribe ‘’ Notify me of all stock quotes of Google from NYSE if the price is greater than 150 ’’ MSc. Distributed System 6
  • 7. Introduction: pub/sub systems • Information consumers express their interests in information with subscriptions, identifying which items are of interest. • Information producers, publish information by submitting publications (a.k.a. publication events or event notifications). • A pub/sub system: • Subscription processing: Indexing and storing subscriptions. • Event processing: upon event arrival, access subscription indices and identify all matched subscriptions. • Event delivery: deliver event to clients with matched subscriptions.. MSc. Distributed System 7
  • 8. Programming model MSc. Distributed System 8 Figure adapted from Instructor’s Guide for Coulouris, Dollimore, Kindbergand Blair, Distributed Systems: Concepts and Design Edn. 5 © Pearson Education 2012
  • 9. Introduction: DB view at pub/sub • Events correspond to data (“data-carrying events”). • Subscriptions correspond to continuous queries: • Define predicates on attributes • Fundamentally different model: • Instead of storing/indexing data and issuing queries to access it • Queries (subscriptions) are stored/indexed and incoming data (events) is matched against stored queries. MSc. Distributed System 9
  • 10. Introduction: Communications view at pub/sub • Akin to multicasting (group IPC, 1-N communication) • Each publisher (through its events) communicates to a large number of subscribers. • However, communication is, • Anonymous • Subscribers do not “know” publishers and vice versa • Asynchronous • publishers and subscribers do not block when publishing/subscribing • Mutually out-of-sync: no rendezvous in time • Heterogeneous • can be used to connect heterogeneous components MSc. Distributed System 10
  • 12. Pub/sub: System Space 12 Figure adapted from K. Pripuºi, I. Podnaršarko, and K. Aberer, Top-k/w publish/subscribe 2012
  • 13. Pub/sub: Subscription models Content based Type based Topic based • Context Type • Object Types • Independent Channels • Hierarchical Topics MSc. Distributed System 13 • (Un)structured queries • Complex Event Processing
  • 14. Pub/sub: Real-world Applications • Too numerous…some representative application classes • News alerts • Online stock quotes • Internet games • Sensor networks • Location-based services • Network management • Internet auctions • ……. MSc. Distributed System 14
  • 15. Case study: Dealing Room MSc. Distributed System 15
  • 16. Case study: Spotify MSc. Distributed System 16
  • 17. Spotify at First glance… • End-to-end architecture to support social interaction • Topic-based subscriptions • Friends (Spotify + Facebook): FB friends who are Spotify users and by sharing music • Playlists (URI): other users playlists (updates), “Collaborative” playlists or only modifiable by creator • Artists pages (follow artist): new albums or news related to artist MSc. Distributed System 17
  • 18. Spotify at First glance… • Hybrid engine • Relay events to online users in real time • Store and forward selected events to offline users • DHT based overlay • 3 sites: Stockholm Sweden, London UK, Ashburn USA • Design to scale • Stores approx., 600 million subscriptions at any given time • Matches billions of publication events every day MSc. Distributed System 18
  • 19. Large scale publish/subscribe systems MSc. Distributed System 19
  • 20. Boolean matching at pub/sub • Assume the dealer room system implemented on top of pub/sub paradigm • Dealer submits a subscription • [Name = ‘Google’ , price > 150 , volume < 5000] • Stock Exchange publishes a stock quote (publication) • [Name = ‘Google’ and price = 200 and volume = 3000] MSc. Distributed System 20
  • 21. Drawbacks at Boolean pub/sub Drawbacks A subscriber may be either overloaded with publications or receive too few publications Impossible to compare different matching publications as ranking functions are not defined, and Partial matching between subscriptions and publications is not supported. MSc. Distributed System 21
  • 22. Real-world Requirements: Sensor Web • Real-time environmental monitoring • Environmental scientists would like to identify and monitor up to 10 sites with the largest pollution readings over the course of a single day - NSF's Ocean Observatories Initiative (OOI) • Identify 10 sensors closest to a particular location measuring the largest pollution levels over time (e.g. top-10 readings are provided on hourly basis) - SNSF’s Sensor Scope project • Power grid monitoring • Operators would like to monitor over time 100 sites with the largest or the lowest power production using solar panel current and voltage readings so that they to identify power grid hot-spots MSc. Distributed System 22
  • 23. Real-world Requirements: Forest Fire rescue MSc. Distributed System 23
  • 24. Real-world Requirements: Social Media • Personalized newspaper • Facebook user is approximately exposed to more than 1500 stories per day, but an average user only engaged with 100 stories from the current news feed. • What if to have a personalized news-paper at the end of day • Social Annotation of news-stories • Serving of Yahoo! News page-views with a fresh set of Top-k tweets, by considering news-story as a subscription while tweets as incoming publications MSc. Distributed System 24
  • 25. Top-k publish/subscribe ‘’ Notify me of all Top-10 stock quotes of Google hourly from NYSE if the price is greater than 150 ’’ MSc. Distributed System 25
  • 26. Top-k publish/subscribe • How many matching publications will be delivered to a subscriber during a period of time? • Actually we don’t know in state-of-the-art pub/sub systems • Top-k pub/sub models are powered by, • Expressive stateful query processing engines • User defined parameter k restricts the delivered publications • Time (in)dependent Top-k computing methods • Sliding window model for handling streaming publications • Methods to deliver Top-k notifications • Pro-active • On-demand MSc. Distributed System 26
  • 27. Abstract Top-k/w matching • Limit the number of matching and delivered publications to k best within a sliding window of size w MSc. Distributed System 27 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 .... 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 .... 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 .... 𝑃5 𝑃1 𝑃5 𝑃6 𝑃5 𝑃9 Top-2 Matching publication stream h=1 h=3 Jumping step (h)
  • 28. [Pripužić 2012] Top-k/w model: DaZaLaPS • Subscriber controls the number of publications it receives per subscription (top-k) within a sliding window • Subscription is defined by • Totally-ordered and time-independent scoring function • Parameter k ∈N • Parameter w ∈R+*(time-based)or n ∈ N (count-based sliding window). • Ranks publications according to the degree of relevance (score) to a subscription • Each publication is competing with other publications from the sliding window for a position among top-k publications MSc. Distributed System 28
  • 29. [Pripužić 2012] Top-k/w model: DaZaLaPS • When can a publication become a Top-k object in the subscription window? • Immediately upon publication • Later on when it becomes a Top-k object in the subscription window MSc. Distributed System 29 • Maintain a set of candidate (potential Top-k) publications in memory!
  • 30. [Pripužić 2012] Distributed Top-k/w model • Network of processing nodes, where each node is responsible for computing Top-k/w publications • Publication Flooding MSc. Distributed System 30 A B C D E F subscribe(s) change(𝑡ℎ 𝑠) publish(p) p p p p p
  • 31. [Pripužić 2012] Distributed Top-k/w model • Subscription Flooding • Proxy subscriptions: • Replicas of original publications which to be advertised over the network MSc. Distributed System 31 A B C D E F subscribe(s) change(𝑡ℎ 𝑠) publish(p) 𝑠 𝑝 𝑠 𝑝 𝑠 𝑝 𝑠 𝑝 𝑠 𝑝 𝑡ℎ 𝑠 𝑡ℎ 𝑠 𝑡ℎ 𝑠 𝑡ℎ 𝑠 𝑡ℎ 𝑠
  • 32. [Pripužić 2012] Distributed Top-k/w model • Rendezvous routing • Often implemented on top of a structured peer-to-peer network • Rendezvous node is responsible for • Matching mapped publications & subscriptions • Delivering matching publications to subscribers directly MSc. Distributed System 32 A B C D E F subscribe(s) publish(p) s s sp p change(𝑡ℎ 𝑠)
  • 33. [Pripužić 2012] Distributed Top-k/w model • Basic gossiping • Similar to publication flooding, but randomly spread through an overlay network as a gossip • Cannot provide any guarantee regarding publication delivery • Purely probabilistic MSc. Distributed System 33 A B C D E F subscribe(s) change(𝑡ℎ 𝑠) publish(p) p p p
  • 34. [Pripužić 2012] Distributed Top-k/w model • Informed gossiping • Each node additionally stores subscriptions of its close neighbors and also processes the subscriptions of its neighbors • Partially probabilistic and partially deterministic MSc. Distributed System 34 A B C D E F subscribe(s) change(𝑡ℎ 𝑠) publish(p) p p 𝑠 𝑝 𝑡ℎ 𝑠 p
  • 35. [Shrarer 2014] Google Top-k pub/sub MSc. Distributed System 35News-story as a subscription Tweets as publications
  • 36. [Shrarer 2014] Google Top-k pub/sub • Annotating news stories with social updates (tweets), at a news website serving high volume of page-views • Billions page-views at Yahoo News! per day • More than 100 millions related tweets per day • Top-k pub/sub approach • stories are standing subscriptions on tweets • Story Index is queried frequently, • but it is updated infrequently • based on DAAT, TAAT algorithms • Tweet Index updated frequently • but queried only for new stories MSc. Distributed System 36
  • 37. [Drosou 2009] PrefSIENA • Say Addison is more interested in horror movies than comedies • Addison would like to receive notifications about (various) comedies only if there are no (or just a few) notifications about horror movies MSc. Distributed System 37 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 21:15 title = Fight Club genre = drama showing time = 23:00 title = Casablanca genre = drama showing time = 23:10title = Vertigo genre = drama showing time = 23:20 Published events User subscriptions genre = drama genre = horror
  • 38. [Drosou 2009] PrefSIENA • To express some form of ranking among subscriptions, PrefSIENA allow users to define priorities among them • To do this, they introduce preferential subscriptions • Based on preferential subscriptions, we deliver to users only the k most interesting events • Covering/Matching relation MSc. Distributed System 38 string director = Peter Jackson time release date > 1 Jan 2003 string director = Steven Spielberg string genre = fantasy string release date > 1 Jan 2003 string title = LOTR: The Return of the King string director = Peter Jackson time release date = 1 Dec 2003 string genre = fantasy integer oscars = 11  
  • 39. [Drosou 2009] PrefSIENA • Ordering subscriptions • To order user subscriptions according to the preference relation, they use the winnow operator1, applying it on various levels • Step 01: Construct DAG MSc. Distributed System 39 genre = drama genre = horror User preferences genre = comedy genre = romance genre = romance genre = action ≻genre = drama genre = horror ≻genre = comedy genre = romance ≻genre = romance genre = action ≻genre = comedy genre = horror genre = drama genre = comedy genre = horror genre = romance genre = action Preference graph
  • 40. [Drosou 2009] PrefSIENA • Step 02: perform a topological sort to compute winnow levels. The subscriptions of level i are associated with a preference rank 𝒢(i): • 𝒢 is a monotonically decreasing function with 𝒢 → [0, 1] • e.g. for 𝒢 = (D +1 – (l -1)) / (D +1) MSc. Distributed System 40 genre = drama genre = comedy genre = horror genre = romance genre = action Preference graph Preference rank = 1 Preference rank = 2/3 Preference rank = 1/3
  • 41. [Drosou 2009] PrefSIENA • Step 03: Computing Event Ranks • Step 04: Based on the ranks, they deliver to users only the k most interesting events • Continuous, periodic & sliding window MSc. Distributed System 41 User subscriptions genre = adventure 0.9 director = Peter Jackson 0.7 string title = King Kong string director = Peter Jackson time release date = 14 Dec 2005 string genre = adventure string title = King Kong string director = Peter Jackson time release date = 14 Dec 2005 string genre = adventure 0.9 ℱ = max
  • 42. [Drosou 2009] PrefSIENA: Sliding window Delivery MSc. Distributed System 42 title = The Big Parade genre = romance showing time = 21:00 title = The Apartment genre = comedy showing time = 21:10 title = The Godfather genre = drama showing time = 21:25 title = Forrest Gump genre = romance showing time = 21:10 title = Jaws genre = horror showing time = 20:55 title = Vertigo genre = horror showing time = 21:45 title = Psycho genre = horror showing time = 21:50 title = Pulp Fiction genre = drama showing time = 21:25 User subscriptions genre = comedy 0.9 genre = romance 0.9 genre = drama 0.8 genre = horror 0.6 20:00 20:15 20:22 20:25 20:50 20:40 20:45 20:55 k = 2 w = 4 title = The Big Parade genre = romance showing time = 21:00 title = The Apartment genre = comedy showing time = 21:10 title = Forrest Gump genre = romance showing time = 21:10 title = The Godfather genre = drama showing time = 21:25 title = Psycho genre = horror showing time = 21:50 title = Pulp Fiction genre = drama showing time = 21:25 Matching events Delivered events
  • 43. [Drosou 2009] PrefSIENA But wait.. • The most highly ranked events may be very similar to each other… • We wish to retrieve results on a broader variety of user interests • Two different perspectives on achieving diversity: • Avoid overlap: choose notifications that are dissimilar to each other • Increase coverage: choose notifications that cover as many user interests as possible • How to measure diversity? • Many alternative ways • Common ground: measure similarity/distance among the selected items MSc. Distributed System 43
  • 44. MSc. Distributed System 44 Diversity: Top-k representative set Representative Top-kDrawback (without diversity) What we want (with diversity) Method to retrieve Top-k publications from matching publications
  • 45. MSc. Distributed System 45 MAX* k-diversity problem where 1. P = {p1, …, pn} 2. k ≤ n 3. d: a distance metric 4. f: a diversity function ),(argmax* dSfS k|S| PS    Find:
  • 46. MSc. Distributed System 46 Proposed: MAXDIVREL k-diversity problem     S-Pinrelevancy&similarity-distheminimize,, Sinrelevancy&similarity-disthemaximize,,g ),,( ),,( maxarg),,(argmax*     rdSh rdS rdSh rdSg rdSfS PS where 1. P = {p1, …, pn} 2. d: a distance metric 3. r: a relevance metric 4. f: a diversity function
  • 47. MSc. Distributed System 47 Formal Definition: MAXDIVREL k-diversity            SPpSp ji i j Spp ji i j ji ji ppd pr pr SP rdSh ppd pr pr S rdS , , dominanceholds),( )( )( || 1 ,,argmin ceindependenholds),( )( )( || 1 ,,gargmax where 1. P = {p1, …, pn} 2. d: a distance metric 3. r: a relevance metric 4. 𝛼 > 0 Independence condition: ∀𝑝𝑖, 𝑝𝑗 ∈ 𝑆, 𝑑 𝑝𝑖, 𝑝𝑗 > 𝛼 Dominance condition: ∀𝑝𝑖 ∈ 𝑃, ∃𝑝𝑗 ∈ 𝑆 𝑠. 𝑡. 𝑑 𝑝𝑖, 𝑝𝑗 ≤ 𝛼; 𝑖 ≠ 𝑗
  • 48. MSc. Distributed System 48 NP-Hardness: Minimum independent-dominating set 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝛼 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2  𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5   jijiji ppppdppodNeighborho  ,|)( 𝑣1 𝑣4 𝑣3𝑣2 𝑣5 Publication space Graph model Independent, dominating Independent, dominating Independent, dominating Dominating, not independent
  • 49. MSc. Distributed System 49 NAÏVE Greedy argmax 𝑟(𝑝𝑖)2 𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗)
  • 50. MSc. Distributed System 50 Handling streaming publications 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝛼 𝑝6 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝑣6 Continuity Requirements 1. Durability an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ window are failed to compete with it. 2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not- older than j.
  • 51. MSc. Distributed System 51 MAXDIVREL continuous k-diversity 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... Matching publication stream 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... ith window (i+1)th window 𝑆𝑖 ∗ 𝑆𝑖+1 ∗ MAXDIVREL k-diversity MAXDIVREL k-diversity Independence Dominance Durability Order  Straightforward solution:  Apply naïve greedy method at each instance  Propose incremental index mechanism!  Avoid the curse of re-calculating neighborhood
  • 52. MSc. Distributed System 52 Locality Sensitive Hashing (LSH)  Simple Idea  if two points are close together, then after a “projection” operation these two points will remain close together
  • 53. MSc. Distributed System 53 LSH Analysis  For any given points 𝑝, 𝑞 ∈ 𝑅 𝑑 𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≥ 𝑃1 𝑓𝑜𝑟 𝑝 − 𝑞 ≤ 𝑑1 𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≤ 𝑃2 𝑓𝑜𝑟 𝑝 − 𝑞 ≥ 𝑐𝑑1 = 𝑑2 • Hash function h is (𝑑1, 𝑑2, 𝑃1, 𝑃2) sensitive, • Ideally we need • (𝑃1−𝑃2) to be large • (𝑑1−𝑑2) to be small
  • 54. MSc. Distributed System 54 LSH in MAXDIVREL: Publications as categorical data
  • 55. MSc. Distributed System 55 LSH in MAXDIVREL: Characteristic Matrix
  • 56. MSc. Distributed System 56 LSH in MAXDIVREL: Minhashing  No Publications any more!  Signature to represent  Technique  Randomly permute the rows at characteristic matrix m times  Take the number of the 1st row, in the permuted order,  which the column has a 1 for the correspondent column of publications. First permutation of rows at characteristic matrix  Advantage:  Reduce the dimensions into a small minhash signature
  • 57. MSc. Distributed System 57 LSH in MAXDIVREL: Signature Matrix Fast-minhashing Select m number of random hash functions To model the effect of m number of random permutation Mathematically proved only when, The number of rows is a prime.
  • 58. MSc. Distributed System 58 LSH in MAXDIVREL: LSH Buckets  Take r sized signature vectors  From m sized minhash- signature  Map them into,  L Hash-Tables  Each with arbitrary b number of buckets
  • 59. MSc. Distributed System 59 LSH in MAXDIVREL: How to select L, r? For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦 1. 𝐿 × 𝑟 = 𝑚 2. ? 2) 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑠) ≈ 1 𝐿 1 𝑟
  • 60. MSc. Distributed System 60 LSH in MAXDIVREL: Analysis For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦  For publications x & y 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦  At a particular hash table  x & y map into the same bucket: 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  x & y does not map into the same bucket: 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  At L Hash-tables  x & y does not map into the same bucket: (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 ) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿 True near neighbors will be unlikely to be unlucky in all the projections
  • 61. MSc. Distributed System 61 LSH in MAXDIVREL: Batch-wise Top-k computation  Bucket “Winner” – a publication which has the highest relevancy score  Winner is dominant to represent it's bucket neighborhood  Top-k "winners“ that have a majority of votes  k winners are independent 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window
  • 62. MSc. Distributed System 62 LSH in MAXDIVREL: Incremental Top-k computation 𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟 Characteristic Matrix 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 Signature Matrix Map 𝑖 𝑡ℎ signature into L hash-tables Update “Winner” at bucket 𝑖 𝑡ℎ signature maps into Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
  • 63. MSc. Distributed System 63 LSH in MAXDIVREL: When new publication F arrives…  Only buckets 𝐵13 , 𝐵23 , 𝐵32 , 𝐵43 will vote  Follow continuity requirements  Durability  Order 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window (i+1)th window 
  • 64. MSc. Distributed System 64 Implementation
  • 65. MSc. Distributed System 65 Cloud service modules Source: Amazon Kinesis Source: Amazon Elastic-cache
  • 66. MSc. Distributed System 66 Top-k pub/sub: DEMO
  • 67. P2P Pub/Sub • Scribe: topic-based, built on top of Pastry, stateful, rendezvous. • Hermes: topic & content-based, built on top of Pastry(-like) net, stateful, rendezvous & flooding-like. • Meghdoot: content-based, built on top of CAN, stateful, rendezvous. • Tera: topic-based, built on unstructured P2P net, stateful, random walk- based-flooding. • Sub2Sub: content-based, built on unstructured P2P net, stateful, flooding- like. • DHTStrings: content-based, DHT-independent, string support, stateless, rendezvous. • OP-DHT Pub/Sub: content-based, (can be) built on top of Chord/Pastry/Bamboo. MSc. Distributed System 67
  • 68. DHT based pub/sub: Scribe • Topic Based • Based on DHT (Pastry) • Rendezvous event routing • A random identifier is assigned to each topic • The pastry node with the identifier closest to the one of the topic becomes responsible for that topic MSc. Distributed System 68
  • 69. DHT based pub/sub: Meghdoot • Content Based • Based on Structured Overlay CAN • Mapping the subscription language and the event space to CAN space • Subscription and event Routing exploit CAN routing algorithms MSc. Distributed System 69
  • 70. Top-k publish/subscribe at P2P • Stateful approaches introduce some kind of state at (intermediate) nodes. State can refer to : • State needed to support specialized structures built on top of the network structure • E.g. trees (parent, children pointers) • Routing state – for ‘content-based routing’: • Subscription paths to be followed by matching publications • Subscriptions (meta)data: not just forward pointers to be followed and subscription content (its predicates), but also possible info as to • What about query inherent diversification? • The controlled parameters (k & w) can change • Updates and the need to maintain state consistency may stress the system and revoke any benefits.. • So we’ll be left with the complexity … MSc. Distributed System 70
  • 71. Future work • Apply Top-k diversification modules at (un)structured P2P • Exploiting overlap among diversified results of users who have similar interest • Develop LSH based index over multi-threaded distributed environment • Develop large scale Top-k pub/sub applications by exploring other suitable use-cases E.g. • Personalized newspaper for every Facebook user • Diverse set of personalized Twitter trends • Social annotation of news-stories MSc. Distributed System 71

Notas do Editor

  1. This design has three main scenarios: (1) every new tweet is used as a query for the Story Index and, for every story s, if it is part of the top-k results for s,we add it to Rs.We also add the new tweet to the Tweet Index; (2) for every new story we query the Tweet Index and retrieve the top-k tweets, which are used to initialize Rs. Wealsoadd thenew story to the Story Index; (3) for every page view we simply fetch the top-k set of tweets R
  2. Given a notification n and a subscription s, s covers n (or n matches s) if and only if every attribute constraint of s is satisfied by some attribute of n