This document discusses publish/subscribe systems and top-k publish/subscribe systems. It provides background on publish/subscribe communication paradigms and taxonomies. It then discusses requirements for top-k publish/subscribe systems to limit the number of matching publications delivered to k best within a time window. Several research papers on distributed top-k publish/subscribe systems are summarized, including their approaches to ranking publications, computing top-k over sliding windows, and delivering top-k results.
3. Communication paradigms
Point-to-point communication
• Participants need to exist at the
same time
• Direct coupling
• Strict Identity management
• Not good for volatile
environment
• Not a good way to communicate
with several participants
Indirect communication
• Communication through an
intermediary between sender(s)
& receiver(s)
• No direct coupling
• Space uncoupling
• Anonymity
• Time uncoupling
• Independent lifetimes
• Through persistent communication
channel
MSc. Distributed System 3
4. Indirect communication
• Scenarios where users connect and disconnect very often
• Mobile environments, messaging services, forums
• Event dissemination where receivers may be unknown and change
often
• RSS, events feeds in financial services
• Scenarios with very large number of participants
• Google Ads system, Spotify
• Commonly used in cases when change is anticipated
• Need to provide dependable services
MSc. Distributed System 4
5. Taxonomy
MSc. Distributed System 5
Indirect
Communication
Communication
based
Group
communication
Message Queues
Publish/subscribe
State based
Tuple spaces
Distributed Shared
Memory
6. Publish/Subscribe
‘’ Notify me of all stock quotes of Google from
NYSE if the price is greater than 150 ’’
MSc. Distributed System 6
7. Introduction: pub/sub systems
• Information consumers express their interests in information with
subscriptions, identifying which items are of interest.
• Information producers, publish information by submitting
publications (a.k.a. publication events or event notifications).
• A pub/sub system:
• Subscription processing: Indexing and storing subscriptions.
• Event processing: upon event arrival, access subscription indices and identify
all matched subscriptions.
• Event delivery: deliver event to clients with matched subscriptions..
MSc. Distributed System 7
9. Introduction: DB view at pub/sub
• Events correspond to data (“data-carrying events”).
• Subscriptions correspond to continuous queries:
• Define predicates on attributes
• Fundamentally different model:
• Instead of storing/indexing data and issuing queries to access it
• Queries (subscriptions) are stored/indexed and incoming data (events) is
matched against stored queries.
MSc. Distributed System 9
10. Introduction: Communications view at
pub/sub
• Akin to multicasting (group IPC, 1-N communication)
• Each publisher (through its events) communicates to a large number of subscribers.
• However, communication is,
• Anonymous
• Subscribers do not “know” publishers and vice versa
• Asynchronous
• publishers and subscribers do not block when publishing/subscribing
• Mutually out-of-sync: no rendezvous in time
• Heterogeneous
• can be used to connect heterogeneous components
MSc. Distributed System 10
17. Spotify at First glance…
• End-to-end architecture to support social interaction
• Topic-based subscriptions
• Friends (Spotify + Facebook): FB friends who are Spotify users and by sharing
music
• Playlists (URI): other users playlists (updates), “Collaborative” playlists or only
modifiable by creator
• Artists pages (follow artist): new albums or news related to artist
MSc. Distributed System 17
18. Spotify at First glance…
• Hybrid engine
• Relay events to online users in real
time
• Store and forward selected events to
offline users
• DHT based overlay
• 3 sites: Stockholm Sweden, London
UK, Ashburn USA
• Design to scale
• Stores approx., 600 million
subscriptions at any given time
• Matches billions of publication events
every day
MSc. Distributed System 18
20. Boolean matching at pub/sub
• Assume the dealer room system implemented on top of pub/sub
paradigm
• Dealer submits a subscription
• [Name = ‘Google’ , price > 150 , volume < 5000]
• Stock Exchange publishes a stock quote (publication)
• [Name = ‘Google’ and price = 200 and volume = 3000]
MSc. Distributed System 20
21. Drawbacks at Boolean pub/sub
Drawbacks
A subscriber may be either
overloaded with publications or
receive too few publications
Impossible to compare different
matching publications as
ranking functions are not
defined, and
Partial matching between
subscriptions and publications
is not supported.
MSc. Distributed System 21
22. Real-world Requirements: Sensor Web
• Real-time environmental monitoring
• Environmental scientists would like to identify and monitor up to 10 sites with
the largest pollution readings over the course of a single day - NSF's Ocean
Observatories Initiative (OOI)
• Identify 10 sensors closest to a particular location measuring the largest
pollution levels over time (e.g. top-10 readings are provided on hourly basis) -
SNSF’s Sensor Scope project
• Power grid monitoring
• Operators would like to monitor over time 100 sites with the largest or the
lowest power production using solar panel current and voltage readings so
that they to identify power grid hot-spots
MSc. Distributed System 22
24. Real-world Requirements: Social Media
• Personalized newspaper
• Facebook user is approximately exposed to more than 1500 stories per day,
but an average user only engaged with 100 stories from the current news
feed.
• What if to have a personalized news-paper at the end of day
• Social Annotation of news-stories
• Serving of Yahoo! News page-views with a fresh set of Top-k tweets, by
considering news-story as a subscription while tweets as incoming
publications
MSc. Distributed System 24
25. Top-k publish/subscribe
‘’ Notify me of all Top-10 stock quotes of Google hourly from NYSE if
the price is greater than 150 ’’
MSc. Distributed System 25
26. Top-k publish/subscribe
• How many matching publications will be delivered to a subscriber
during a period of time?
• Actually we don’t know in state-of-the-art pub/sub systems
• Top-k pub/sub models are powered by,
• Expressive stateful query processing engines
• User defined parameter k restricts the delivered publications
• Time (in)dependent Top-k computing methods
• Sliding window model for handling streaming publications
• Methods to deliver Top-k notifications
• Pro-active
• On-demand
MSc. Distributed System 26
27. Abstract Top-k/w matching
• Limit the number of matching and delivered publications to k best within a sliding
window of size w
MSc. Distributed System 27
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 𝑃9 𝑃10 ....
𝑃5 𝑃1
𝑃5 𝑃6
𝑃5 𝑃9
Top-2
Matching publication stream
h=1
h=3
Jumping
step
(h)
28. [Pripužić 2012] Top-k/w model: DaZaLaPS
• Subscriber controls the number of publications it receives per
subscription (top-k) within a sliding window
• Subscription is defined by
• Totally-ordered and time-independent scoring function
• Parameter k ∈N
• Parameter w ∈R+*(time-based)or n ∈ N (count-based sliding window).
• Ranks publications according to the degree of relevance (score) to a
subscription
• Each publication is competing with other publications from the sliding
window for a position among top-k publications
MSc. Distributed System 28
29. [Pripužić 2012] Top-k/w model: DaZaLaPS
• When can a publication become a Top-k object in the subscription
window?
• Immediately upon publication
• Later on when it becomes a Top-k object in the subscription window
MSc. Distributed System 29
• Maintain a set of candidate
(potential Top-k) publications
in memory!
30. [Pripužić 2012] Distributed Top-k/w model
• Network of processing nodes, where each node is responsible for
computing Top-k/w publications
• Publication Flooding
MSc. Distributed System 30
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
p
p
p
31. [Pripužić 2012] Distributed Top-k/w model
• Subscription Flooding
• Proxy subscriptions:
• Replicas of original publications which to be advertised
over the network
MSc. Distributed System 31
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑠 𝑝
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
𝑡ℎ 𝑠
32. [Pripužić 2012] Distributed Top-k/w model
• Rendezvous routing
• Often implemented on top of a structured peer-to-peer network
• Rendezvous node is responsible for
• Matching mapped publications & subscriptions
• Delivering matching publications to subscribers directly
MSc. Distributed System 32
A
B
C D
E
F
subscribe(s)
publish(p)
s
s
sp
p
change(𝑡ℎ 𝑠)
33. [Pripužić 2012] Distributed Top-k/w model
• Basic gossiping
• Similar to publication flooding, but randomly spread through an overlay
network as a gossip
• Cannot provide any guarantee regarding publication delivery
• Purely probabilistic
MSc. Distributed System 33
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
p
34. [Pripužić 2012] Distributed Top-k/w model
• Informed gossiping
• Each node additionally stores subscriptions of its close neighbors and also
processes the subscriptions of its neighbors
• Partially probabilistic and partially deterministic
MSc. Distributed System 34
A
B
C D
E
F
subscribe(s)
change(𝑡ℎ 𝑠)
publish(p)
p
p
𝑠 𝑝
𝑡ℎ 𝑠
p
35. [Shrarer 2014] Google Top-k pub/sub
MSc. Distributed System 35News-story as a subscription Tweets as publications
36. [Shrarer 2014] Google Top-k pub/sub
• Annotating news stories with social updates (tweets), at a news website
serving high volume of page-views
• Billions page-views at Yahoo News! per day
• More than 100 millions related tweets per day
• Top-k pub/sub approach
• stories are standing subscriptions on tweets
• Story Index is queried frequently,
• but it is updated infrequently
• based on DAAT, TAAT algorithms
• Tweet Index updated frequently
• but queried only for new stories
MSc. Distributed System 36
37. [Drosou 2009] PrefSIENA
• Say Addison is more interested in horror movies than comedies
• Addison would like to receive notifications about (various) comedies only if
there are no (or just a few) notifications about horror movies
MSc. Distributed System 37
title = The Godfather
genre = drama
showing time = 21:10 title = Ratatouille
genre = comedy
showing time = 21:15
title = Fight Club
genre = drama
showing time = 23:00
title = Casablanca
genre = drama
showing time = 23:10title = Vertigo
genre = drama
showing time = 23:20
Published events User subscriptions
genre = drama
genre = horror
38. [Drosou 2009] PrefSIENA
• To express some form of ranking among subscriptions, PrefSIENA
allow users to define priorities among them
• To do this, they introduce preferential subscriptions
• Based on preferential subscriptions, we deliver to users only the k most
interesting events
• Covering/Matching relation
MSc. Distributed System 38
string director = Peter Jackson
time release date > 1 Jan 2003
string director = Steven Spielberg
string genre = fantasy
string release date > 1 Jan 2003
string title = LOTR: The Return of the King
string director = Peter Jackson
time release date = 1 Dec 2003
string genre = fantasy
integer oscars = 11
39. [Drosou 2009] PrefSIENA
• Ordering subscriptions
• To order user subscriptions according to the preference relation, they use
the winnow operator1, applying it on various levels
• Step 01: Construct DAG
MSc. Distributed System 39
genre = drama genre = horror
User preferences
genre = comedy genre = romance
genre = romance genre = action
≻genre = drama genre = horror
≻genre = comedy genre = romance
≻genre = romance genre = action
≻genre = comedy genre = horror
genre = drama genre = comedy
genre = horror genre = romance
genre = action
Preference graph
40. [Drosou 2009] PrefSIENA
• Step 02: perform a topological sort to compute winnow levels. The
subscriptions of level i are associated with a preference rank 𝒢(i):
• 𝒢 is a monotonically decreasing function with 𝒢 → [0, 1]
• e.g. for 𝒢 = (D +1 – (l -1)) / (D +1)
MSc. Distributed System 40
genre = drama genre = comedy
genre = horror genre = romance
genre = action
Preference graph
Preference rank = 1
Preference rank = 2/3
Preference rank = 1/3
41. [Drosou 2009] PrefSIENA
• Step 03: Computing Event Ranks
• Step 04: Based on the ranks, they deliver to users only the k most
interesting events
• Continuous, periodic & sliding window
MSc. Distributed System 41
User subscriptions
genre = adventure 0.9
director = Peter Jackson 0.7
string title = King Kong
string director = Peter Jackson
time release date = 14 Dec 2005
string genre = adventure
string title = King Kong
string director = Peter Jackson
time release date = 14 Dec 2005
string genre = adventure
0.9
ℱ = max
42. [Drosou 2009] PrefSIENA: Sliding window
Delivery
MSc. Distributed System 42
title = The Big Parade
genre = romance
showing time = 21:00
title = The Apartment
genre = comedy
showing time = 21:10
title = The Godfather
genre = drama
showing time = 21:25
title = Forrest Gump
genre = romance
showing time = 21:10
title = Jaws
genre = horror
showing time = 20:55
title = Vertigo
genre = horror
showing time = 21:45
title = Psycho
genre = horror
showing time = 21:50
title = Pulp Fiction
genre = drama
showing time = 21:25
User subscriptions
genre = comedy 0.9
genre = romance 0.9
genre = drama 0.8
genre = horror 0.6
20:00
20:15
20:22
20:25
20:50
20:40
20:45
20:55
k = 2
w = 4
title = The Big Parade
genre = romance
showing time = 21:00
title = The Apartment
genre = comedy
showing time = 21:10
title = Forrest Gump
genre = romance
showing time = 21:10
title = The Godfather
genre = drama
showing time = 21:25
title = Psycho
genre = horror
showing time = 21:50
title = Pulp Fiction
genre = drama
showing time = 21:25
Matching events Delivered events
43. [Drosou 2009] PrefSIENA But wait..
• The most highly ranked events may be very similar to each other…
• We wish to retrieve results on a broader variety of user interests
• Two different perspectives on achieving diversity:
• Avoid overlap: choose notifications that are dissimilar to each other
• Increase coverage: choose notifications that cover as many user interests as possible
• How to measure diversity?
• Many alternative ways
• Common ground: measure similarity/distance among the selected items
MSc. Distributed System 43
44. MSc. Distributed System 44
Diversity: Top-k representative set
Representative Top-kDrawback
(without diversity)
What we want
(with diversity)
Method to retrieve Top-k publications from matching publications
45. MSc. Distributed System 45
MAX* k-diversity problem
where
1. P = {p1, …, pn}
2. k ≤ n
3. d: a distance metric
4. f: a diversity function
),(argmax*
dSfS
k|S|
PS
Find:
46. MSc. Distributed System 46
Proposed: MAXDIVREL k-diversity problem
S-Pinrelevancy&similarity-distheminimize,,
Sinrelevancy&similarity-disthemaximize,,g
),,(
),,(
maxarg),,(argmax*
rdSh
rdS
rdSh
rdSg
rdSfS
PS
where
1. P = {p1, …, pn}
2. d: a distance metric
3. r: a relevance metric
4. f: a diversity function
47. MSc. Distributed System 47
Formal Definition: MAXDIVREL k-diversity
SPpSp
ji
i
j
Spp
ji
i
j
ji
ji
ppd
pr
pr
SP
rdSh
ppd
pr
pr
S
rdS
,
,
dominanceholds),(
)(
)(
||
1
,,argmin
ceindependenholds),(
)(
)(
||
1
,,gargmax
where
1. P = {p1, …, pn}
2. d: a distance metric
3. r: a relevance metric
4. 𝛼 > 0
Independence condition:
∀𝑝𝑖, 𝑝𝑗 ∈ 𝑆, 𝑑 𝑝𝑖, 𝑝𝑗 > 𝛼
Dominance condition:
∀𝑝𝑖 ∈ 𝑃, ∃𝑝𝑗 ∈ 𝑆 𝑠. 𝑡. 𝑑 𝑝𝑖, 𝑝𝑗 ≤ 𝛼; 𝑖 ≠ 𝑗
50. MSc. Distributed System 50
Handling streaming publications
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝛼
𝑝6
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝑣6
Continuity Requirements
1. Durability
an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window
if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ
window are failed to compete with it.
2. Order
Publication stream follow the chronological order
We avoid the selection of item j as diverse later, when we already selected an item i which is not-
older than j.
52. MSc. Distributed System 52
Locality Sensitive Hashing (LSH)
Simple Idea
if two points are close together, then after a “projection” operation these two
points will remain close together
53. MSc. Distributed System 53
LSH Analysis
For any given points 𝑝, 𝑞 ∈ 𝑅 𝑑
𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≥ 𝑃1 𝑓𝑜𝑟 𝑝 − 𝑞 ≤ 𝑑1
𝑃 𝐻 ℎ 𝑝 = ℎ 𝑞 ≤ 𝑃2 𝑓𝑜𝑟 𝑝 − 𝑞 ≥ 𝑐𝑑1 = 𝑑2
• Hash function h is (𝑑1, 𝑑2, 𝑃1, 𝑃2) sensitive,
• Ideally we need
• (𝑃1−𝑃2) to be large
• (𝑑1−𝑑2) to be small
56. MSc. Distributed System 56
LSH in MAXDIVREL:
Minhashing
No Publications any more!
Signature to represent
Technique
Randomly permute the rows at
characteristic matrix m times
Take the number of the 1st row, in
the permuted order,
which the column has a 1 for
the correspondent column of
publications.
First permutation of rows at characteristic matrix
Advantage:
Reduce the dimensions into a small
minhash signature
57. MSc. Distributed System 57
LSH in MAXDIVREL:
Signature Matrix
Fast-minhashing
Select m number of random hash
functions
To model the effect of m number of
random permutation
Mathematically proved only when,
The number of rows is a prime.
58. MSc. Distributed System 58
LSH in MAXDIVREL:
LSH Buckets
Take r sized
signature vectors
From m sized
minhash-
signature
Map them into,
L Hash-Tables
Each with
arbitrary b
number of
buckets
59. MSc. Distributed System 59
LSH in MAXDIVREL:
How to select L, r?
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
1. 𝐿 × 𝑟 = 𝑚
2. ?
2) 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝑠) ≈
1
𝐿
1
𝑟
60. MSc. Distributed System 60
LSH in MAXDIVREL:
Analysis
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
For publications x & y
𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦
At a particular hash table
x & y map into the same bucket:
𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
x & y does not map into the same bucket:
1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
At L Hash-tables
x & y does not map into the same bucket:
(1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿
True near neighbors will
be unlikely to be unlucky
in all the projections
61. MSc. Distributed System 61
LSH in MAXDIVREL:
Batch-wise Top-k computation
Bucket “Winner” – a publication which has the
highest relevancy score
Winner is dominant to represent it's bucket
neighborhood
Top-k "winners“ that have a majority of votes
k winners are independent
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
62. MSc. Distributed System 62
LSH in MAXDIVREL:
Incremental Top-k computation
𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟
Characteristic
Matrix
𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒
Signature
Matrix
Map 𝑖 𝑡ℎ
signature
into L hash-tables
Update “Winner” at
bucket 𝑖 𝑡ℎ
signature
maps into
Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
63. MSc. Distributed System 63
LSH in MAXDIVREL:
When new publication F arrives…
Only buckets 𝐵13
, 𝐵23
, 𝐵32
, 𝐵43
will vote
Follow continuity requirements
Durability
Order
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
(i+1)th
window
67. P2P Pub/Sub
• Scribe: topic-based, built on top of Pastry, stateful, rendezvous.
• Hermes: topic & content-based, built on top of Pastry(-like) net, stateful,
rendezvous & flooding-like.
• Meghdoot: content-based, built on top of CAN, stateful, rendezvous.
• Tera: topic-based, built on unstructured P2P net, stateful, random walk-
based-flooding.
• Sub2Sub: content-based, built on unstructured P2P net, stateful, flooding-
like.
• DHTStrings: content-based, DHT-independent, string support, stateless,
rendezvous.
• OP-DHT Pub/Sub: content-based, (can be) built on top of
Chord/Pastry/Bamboo.
MSc. Distributed System 67
68. DHT based pub/sub: Scribe
• Topic Based
• Based on DHT (Pastry)
• Rendezvous event routing
• A random identifier is assigned to each topic
• The pastry node with the identifier closest to the one of the topic
becomes responsible for that topic
MSc. Distributed System 68
69. DHT based pub/sub: Meghdoot
• Content Based
• Based on Structured Overlay CAN
• Mapping the subscription language and the event space to CAN
space
• Subscription and event Routing exploit CAN routing algorithms
MSc. Distributed System 69
70. Top-k publish/subscribe at P2P
• Stateful approaches introduce some kind of state at (intermediate) nodes.
State can refer to :
• State needed to support specialized structures built on top of the network structure
• E.g. trees (parent, children pointers)
• Routing state – for ‘content-based routing’:
• Subscription paths to be followed by matching publications
• Subscriptions (meta)data: not just forward pointers to be followed and subscription
content (its predicates), but also possible info as to
• What about query inherent diversification?
• The controlled parameters (k & w) can change
• Updates and the need to maintain state consistency may stress the
system and revoke any benefits..
• So we’ll be left with the complexity …
MSc. Distributed System 70
71. Future work
• Apply Top-k diversification modules at (un)structured P2P
• Exploiting overlap among diversified results of users who have similar interest
• Develop LSH based index over multi-threaded distributed
environment
• Develop large scale Top-k pub/sub applications by exploring other
suitable use-cases E.g.
• Personalized newspaper for every Facebook user
• Diverse set of personalized Twitter trends
• Social annotation of news-stories
MSc. Distributed System 71
This design has three main scenarios: (1) every new tweet is used as a query for the Story Index and, for every story
s, if it is part of the top-k results for s,we add it to Rs.We also add the new tweet to the Tweet Index; (2) for every new story we query the Tweet Index and retrieve the top-k
tweets, which are used to initialize Rs. Wealsoadd thenew story to the Story Index; (3) for every page view we simply
fetch the top-k set of tweets R
Given a notification n and a subscription s, s covers n (or n matches s) if and only if every attribute constraint of s is satisfied by some attribute of n