The document provides an overview of the publish-subscribe model from the perspective of a database. It discusses key aspects of the publish-subscribe model including decoupling of publishers and subscribers, subscription models, and quality measures. It also examines applying publish-subscribe concepts in databases through expressions, continuous queries, and using XML with XFilters and SQL queries.
3. Introduction
O Traditional system-centric approach
O Request/Response query of data.
O Data volume and response time.
O Data-centric approach
O Publishers
O Subscribers
O Notification system
5. Introduction Cont.
O Publish-Subscribe model advantages:
O Enhanced response time.
O Enhanced results.
O Database resources utilization and
increased capacity.
O Loosely coupled relationship between
publishers and subscribers.
O Scalability.
6. Publish-Subscribe Model
Overview
O Described as events or pattern of events
produced by publishers that subscribers
interested in and notified when they are
available.
O Information has been referred to as
Notifications in this paradigm.
O Subscribers can continue their tasks until
the notification service delivers
notifications.
8. Publish-Subscribe Basic Model
Overview Cont.
O Decoupling types between publishers and
subscribers:
O Space decoupling: in which publishers and
subscribers do not need to know each other.
O Time decoupling: in which publishers and
subscribers do not need to be running at the
same time.
O Synchronization decoupling: in which publishers
and subscribers operations and tasks are not
halted during publishing and receiving
notifications.
O Scalable system that fits well in distributed
systems.
9. Publish-Subscribe Model
Overview Cont.
O Other communication models existed aside from
publish-subscribe model:
O Message passing:
O Relies on messages for establishing communication
between the sender and the receiver.
O Message production done Asynchronously.
O Message consumption done Synchronously.
O Both need to be available in the same time.
O Not decoupled in terms of time and space
10. Publish-Subscribe Model
Overview Cont.
O Other communication models existed aside from
publish-subscribe model:
O Remote call procedure (RPC):
O Intends to make remote interactions looks the same as
local interactions.
O Coupled in time, space and synchronization.
O Notifications:
O Notifications sent by client to the server including
callback arguments.
O Notifications sent by server to the client including the
result.
O Coupled in time and space.
11. Publish-Subscribe Model
Overview Cont.
O Other communication models existed aside from
publish-subscribe model:
O Shared space:
O Based on tuple-space: ordered collection of tuples
accessed by all parties.
O Adding and deleting tuples from tuple-space
Synchronously.
O Decouple time and space.
O Message queuing:
O Uses tuple-space, queues are provided with messages
from producers and additional transactional, ordering
and timing functionalities are provided by the message
queue.
O Same as Shared space.
12. Subscription models
O Topic-based subscription model
O Also referred to as Subject-based models.
O subscriber shows interest in a particular topic and
receives notifications filtered based on that.
O Similar to joining to a group but more dynamic.
O Hierarchy based.
O Limited amount of expressions provided for
subscribers to filter and limit their interested criteria.
O Subscribe to more than one topic in a single
subscription.
13. Subscription models Cont.
O Content -based subscription model
O Bound to the content of events themselves
rather than external criteria.
O Subscription language is used for filtering
O CarBrand = „Mercedes‟ and Price <= 20,000
O StockName = „T*‟ and change > 3
O Needs more expressive criteria to determine
which will generate a lot of traffic on the network.
O More advanced and complex notification system
to be able to filter each event and extract
subscriptions
14. Subscription models Cont.
O Type-based subscription model
O Built using concepts from Object-Oriented.
O Events are objects that can hold attributes and
methods and notifications are objects of specific
type.
O Subscribers of specific object types will only
receive instances of that type or its sub-types.
O Performance issues when a large amount of
events that need to be processed all at runtime.
15. Subscription related
characteristics
O Push and Pull
O Time driven and data driven
O Full update and incremental update
O Broadcast and unicast data delivery
16. Quality measures of publish-
subscribe services
O Quality measures and metrics when
designing any publish-subscribe model:
O Reliability.
O Security.
O Priority.
O Latency.
17. Publish-Subscribe model:
Database Perspective
O Publish-subscribe with expressions.
O Continuous Query.
O XML.
18. Publish-subscribe with
expressions
O Boolean expression used to specify subscribers‟
interest in an event by filtering their criteria using
name-value, comparison operators
(=, >, >=, <, <=) and regular expressions.
O We will use SQL and relational database.
19. Publish-subscribe with
expressions Cont.
O Example: Interested in cars for sale
O Brand Cadillac and price less than 35000
O Rules :
ON Car4Sale
IF (Model = ‘Cadillac’ and Price < 35000)
THEN notify(‘abc@yahoo.com’)
20. Publish-subscribe with
expressions Cont.
SubscriberI Address … Interest
D …
100 Amman … Model = „Cadillac‟ and
…. Price < 35000
101 Irbid … Model = „Mercedes‟
…. and Year > =2007
SELECT * FROM [SUBSCRIBERS]
WHERE
EVALUATE(SUBSCRIBERS.Interest,
<DATA ITEM>) = 1
21. Publish-subscribe with
expressions Cont.
O Queries can be simple, complex, with any type of
join.
O Publishers can put limitations on predicates.
SELECT * FROM SUBSCRIBERS
WHERE
EVALUATE(SUBSCRIBERS. Interest,
<CAR DETAILS>) = 1
AND SUBSCRIBERS.Address = „Amman‟
ORDER BY SUBSCRIBERS.SubscriberID
DESC
22. Publish-subscribe with
expressions Cont.
O Storing expressions as Table data
O Store these conditions as data in special type
columns.
O Metadata is needed
O To store information about values stored in the
condition predicates.
O A list of built-in and user-defined functions
referenced by the condition.
O Validate values stored when new or existing
columns are modified.
O Indexes can be added.
23. Publish-subscribe with
expressions Cont.
O Evaluating expressions
O Evaluate operator is new to SQL.
O Conditional expression is translated into a
WHERE condition in SQL.
O Expression Metadata used to determine
the structure of the FROM clause.
O The result returned is 1 (true) when the
condition is satisfied.
24. Publish-subscribe with
expressions Cont.
SELECT DISTINCT SUBSCRIBERS.SubscriberID,
(CASE WHEN SUBSCRIBERS.annual_income > 100000
THEN notify_salesperson (SUBSCRIBERS.PhoneNumber)
ELSE
create_email_msg (SUBSCRIBERS.EmailAddress
)
FROM SUBSCRIBERS, INVENTORY
WHERE
EVALUATE(SUBSCRIBERS.Interest, <car details FROM
INVENTORY>) = 1
AND
Sub_DISTANCE(SUBSCRIBERS.Address,:DealerLoc,‟distance=50‟)
= ‟TRUE‟
Group BY SUBSCRIBERS.SubscriberID
25. Continuous Query
O Queries constructed only once and stored
for continuous use over the database.
O Used over Append-Only databases.
O First used to support Tapestry systems.
O Uses time-based approach rather than
triggers.
O Uses a language called TQL (Tapes-try
query language).
27. Continuous Query Cont.
O Continuous query suffers some dis-efficiency:
O Non-deterministic results.
O Duplicate.
O Inefficiency of the system
O To overcome this:
O Incremental queries which run periodically.
O Has two timestamps: last execution time (t) and
current time (T).
O Only results in the period (T-t) are returned.
28. Continuous Query Cont.
O Incremental queries:
Set T. –∞
FOREVER DO
set t:= current time
Execute query Q (z, t)
Return result to user
set T:= t
Sleep for some period of time
ENDLOOP
29. Continuous Query Cont.
O To overcome duplicates, queries are transformed into Monotone
queries.
O Queries whom results are not increased as new tuples added to
the database
SELECT * FROM tbl
WHERE tbl.field = “test”
AND tbl.ts < t
SELECT m.msgid
FROM m
WHERE NOT EXISTS(
SELECT * FROM m ml
WHERE ml.inreplyto = m.msgid
AND t< ml.ts + 2 weeks
)
30. XML
O Importance of XML as a standard
information exchange mechanism.
O Its capabilities of encoding structural
information in documents.
O Using XML in creating user profiles.
O Using XFilters, which is a mechanism that
matches XML documents to user profiles
and relational databases, matched
documents are returned using XPath to
interested users .
32. XML Cont.
O XPath query is decomposed into a set of
path nodes using XPath parser.
O Tags are extracted from these nodes and
stored in a TagPath table.
O linear path is extracted from user
subscription XPath profile and stored in
the LinearPath table.
O TagPath table is used to match linear
paths in users‟ subscriptions with TagPath
from XML documents.
33. XML Cont.
O SQL query can perform any DML
operation on them.
O SQL query is ran recursively to match
XML messages with subscriptions.
O Values of stored path tags are used as
predicates in the join
34. Conclusion
O Publish-subscribe system consists of:
O publishers, who wish to disseminate
messages in a form of events to interested
users,
O Subscribers, who wish to be notified with
these events by subscribing to them.
O Notification management system that
maintains a database with all publishers
and subscribers.
35. Conclusion Cont.
O Database is used to match events and
subscriptions by evaluating events based on:
O Expressions provided by subscribers in a form
of queries stored in the database.
O Continuous queries that target Append-only
databases using a time-based approach.
O XML that uses XFilters to match XML
documents against user profiles, filter and
return them using SQL queries in relational
databases and XPath queries.
Notas do Editor
Enhanced response time: subscribers to these services do not need to wait for their queries to be processed, instead, the results will be delivered to them upon availability and when their filtering options are met.Enhanced results: query results are only related to filter criteria subscribers are interested in, therefore allowing subscribers to filer the huge amount of available information Database resources utilization and increased capacity: instead of processing user queries one at a time, multiple subscriptions can be processed together resulting in increased query processing time and increased overall capacity of the database.Loosely coupled relationship between publishers and subscribers: publishers and subscribers do not need to know each other, in which their identity can remain anonymous and only the notification system is the one responsible of managing them. In addition, system topology is not needed to be known for either publishers or subscribers.Scalability: publish-subscribe model is highly scalable for small systems that provides parallel and multiple query processing, caching capabilities for messages and routing functionalities. For more large and complex publish-subscribe implementation, this become a challenge that needs more research effort.
events or pattern of events
Space decoupling: in which publishers and subscribers do not need to know each other. as we stated before, the integration between publishers and subscribers is done by the notification service and hence publishers do not know who and how many subscribers to this events and subscribers do not know who and how many publishers in this event.Time decoupling: in which publishers and subscribers do not need to be running at the same time. For examples, when notifications are published, some subscribers can be off at that time and when they are up, they will receive these notifications later in time while publishers are off for example.Synchronization decoupling: in which publishers and subscribers operations and tasks are not halted during publishing and receiving notifications. Notifications are delivered asynchronously to subscribers when these events occur.The result of decoupling between publishers and subscribers is a scalable system that removes dependencies between these parties and makes this model fit very well in distributed systems.
They have some similarities with this model but they fail to be fully decoupled between publishers and subscribers in terms of time, space and synchronization.
They have some similarities with this model but they fail to be fully decoupled between publishers and subscribers in terms of time, space and synchronization.RPC enhanced later to avoid synchronization issues by implementing another version that make the interaction made asynchronously without returning any acknowledgment messages which reduces reliability. To overcome this, another approach was proposed with acknowledgments that only accessed when needed.
There are a different number of subscription models to which a subscribers shows interest in some events and how they are filtered to match these interest. The degree to which these events can be filtered to match subscribers interests are highly related to the expressive power of the subscription language used.
In traditional request/response querying of data, clients usually pull information from the database to their applications. Another method is to push information from the server to the client like news services. Since most transport protocols like HTTP and TCP do not support this, a smart pull can be used by installing a system service in the background asking to pull new information.
A guarantee of successful delivery is needed since publishers and subscriber interact asynchronously with each other.since publish-subscribe systems work with a large, dynamic community of publishers and subscribers using various heterogeneous platforms, security issues should be carefully addresses and assessed in terms of authentication, confidentiality, integrity and accountability.
To store information about values stored in the condition predicates since expressions are usually not self-descriptive and any values stored in the condition can produce different results based on the type of the values
To sum up, expressions allows us to list all subscribers of an event using a single query that can be scaled up to include more subscribers with same interests. In addition, it allows us to use relational databases and utilize their capabilities in constructing expression queries.
Continuous query is a new type of queries that is constructed only once and stored for continuous use over the database. Continuous queries were first introduced to support Tapestry systems which are systems that store electronic documents such as emails and news articles in a database. Additional information about authors, title, date and other keywords are stored too. Continuous query is used over append-only databases in which new added documents will remain in the database and never removed. TQL allows users to run their queries against the database refine it until they are satisfied with it and then store it as a continuous query
Non-deterministic results: which means that results from executing queries are dependent on the execution time of the query. If the same query executed over different period of times, different results will be obtained.Duplicate: as Continues query is executed over append-only databases, all old and new results will be returned to users although they might be interested only with new ones.Inefficiency of the system. Since all new and old data is retrieved each time, large amount of data will be returned each time resulting in more executing time and degradation in the system performance.
It allows us to use relational databases to execute queries more efficiently with enhanced performance.It supports time-oriented queries without the need to use triggers.Flexibility provided to execute scheduled queries on a user-preference basis.