An increasing number of Software Architects are realising that data is the most important asset of a system and are staring to embrace the Data-Centric revolution (datacentricmanifesto.org) — setting data at the center of their architecture and modelling applications as “visitors” to the data. At the same time, architect have also realised that reactive architectures (reactivemanifesto.org) facilitates the design of scalable, fault-tolerant and high performance systems.
Yet, few architects have realised that reactive and data-centric architectures are the two sides to the same coin and should always go hand in hand.
This presentation shows how reactive data-centric systems can be designed and built taking advantage of Vortex data sharing capabilities along with its integration with reactive and data-centric processing technologies such as Apache Spark and ReactiveX.
3. CopyrightPrismTech,2015
Up to recently Reactive Systems were mainstream only domains such as
Industrial, Military, Telco, Finance and Aerospace
To address the challenges imposed by the Internet of Thing and Big Data
Applications an increasing number of architects and developers is recognising
the importance of the techniques and technologies used for building Reactive
Systems
The Raising Importance of being Reactive
4. CopyrightPrismTech,2015
The techniques and technologies
traditionally used in a class of
Reactive and Interactive Systems
is gaining quite a bit of attention
from the mainstream developers
community
Recently, the Reactive Manifesto
was defined to summarise the
key traits of “Reactive
Applications”
Reactive Manifesto
Responsive
Scalable
Event-Driven Resilient
http://www.reactivemanifesto.org/
5. CopyrightPrismTech,2015
Event-Driven: modular,
asynchronous
Responsive: Timely react to stimuli
Resilient: Tolerate failures
functionally and ideally temporally
Scalable: Gracefully scale up and
down depending on load demands
Reactive Manifesto
Responsive
Scalable
Event-Driven Resilient
http://www.reactivemanifesto.org/
7. CopyrightPrismTech,2015
The prevailing application-/service-centric mindset by focusing on the control-
flow as opposed than the data-flow has lead to design systems that are hard to
extend, scale and make fault-tolerant
An increasing number of architects has realised that the remedy is to change the
main point of attention: Data is the centre of the universe — applications are
ephemeral
The Importance of being Data-Centric
12. CopyrightPrismTech,2015
Data-Centric: data is what matters.
Application autonomously and
asynchronously transform data
Responsive: Timely react to stimuli
Resilient: Tolerate failures
functionally and ideally temporally
Scalable: Gracefully scale up and
down depending on load demands
Reactive Data-Centric Manifesto
Responsive
Scalable
Data-Centric Resilient
http://www.reactivemanifesto.org/
15. CopyrightPrismTech,2015
Vortex provides a Distributed Data
Space abstraction where applications
can autonomously and
asynchronously read and write data
enjoying spatial and temporal
decoupling
Its built-in dynamic discovery isolates
applications from network topology
and connectivity details
Vortex’ Data Space is completely
decentralised
High Level Abstraction
DDS Global Data Space
...
Data
Writer
Data
Writer
Data
Writer
Data
Reader
Data
Reader
Data
Reader
Data
Reader
Data
Writer
TopicA
QoS
TopicB
QoS
TopicC
QoS
TopicD
QoS
16. CopyrightPrismTech,2015
Vortex supports the definition of Data
Models.
These data models allow to naturally
represent physical and virtual entities
characterising the application
domain
Vortex types are extensible and
evolvable, thus allowing incremental
updates and upgrades
Data Centricity
17. CopyrightPrismTech,2015
A Topic defines a domain-wide information’s class
A Topic is defined by means of a (name, type, qos)
tuple, where
• name: identifies the topic within the domain
• type: is the programming language type
associated with the topic. Types are
extensible and evolvable
• qos: is a collection of policies that express
the non-functional properties of this topic,
e.g. reliability, persistence, etc.
Topic
Topic
Type
Name
QoS
struct
TemperatureSensor
{
@key
long
sid;
float
temp;
float
hum;
}
18. CopyrightPrismTech,2015
As explained in the previous slide a topic defines a class/type of information
Topics can be defined as Singleton or can have multiple Instances
Topic Instances are identified by means of the topic key
A Topic Key is identified by a tuple of attributes -- like in databases
Remarks:
- A Singleton topic has a single domain-wide instance
- A “regular” Topic can have as many instances as the number of different key
values, e.g., if the key is an 8-bit character then the topic can have 256 different
instances
Topic and Instances
19. CopyrightPrismTech,2015
Vortex “knows” about
application data types
and uses this
information provide
type-safety and content-
based routing
Content Awareness
struct
TemperatureSensor
{
@key
long
sid;
float
temp;
float
hum;
}
sid temp hum
101 25.3 0.6
507 33.2 0.7
913 27,5 0.55
1307 26.2 0.67
“temp
>
25
OR
hum
>=
0.6”
sid temp hum
101 25.3 0.6
507 33.2 0.7
1307 26.2 0.67
Type
TempSensor
20. CopyrightPrismTech,2015
For data to flow from a DataWriter (DW) to
one or many DataReader (DR) a few
conditions have to apply:
The DR and DW domain participants have
to be in the same domain
The partition expression of the DR’s
Subscriber and the DW’s Publisher should
match (in terms of regular expression
match)
The QoS Policies offered by the DW should
exceed or match those requested by the DR
Quality of Service
Domain
Participant
DURABILITY
OWENERSHIP
DEADLINE
LATENCY BUDGET
LIVELINESS
RELIABILITY
DEST. ORDER
Publisher
DataWriter
PARTITION
DataReader
Subscriber
Domain
Participant
offered
QoS
Topic
writes reads
Domain Id
joins joins
produces-in consumes-from
RxO QoS Policies
requested
QoS
22. CopyrightPrismTech,2015
Reactive Systems1
A Reactive System designates a permanently operating system that responds,
reacts, to external stimuli at a speed determined by its environment, e.g. an
industrial control system.
Reactive Systems
[1] Nicolas Halbwachs, Synchronous Programming of Reactive Systems
25. CopyrightPrismTech,2015
Reactive
react to an environment which
“cannot wait”, e.g. reacting late has
harsh consequences
response time needs to be
deterministic
distributed concurrent systems
data-centric
Reactive vs. Interactive Systems
The term reactive systems is increasingly used to denote interactive systems with stringent
response time, performance and scalability constraints.
Interactive
react to an environment which
“doesn’t like to wait”, e.g. reacting
late induces QoS degradation
response time needs to be acceptable
distributed concurrent systems
data-centric
26. CopyrightPrismTech,2015
Concurrency
Aside from the concurrency between the system and its environment, it is natural to decompose
these systems as an ensemble of concurrent components that cooperate to achieve the intended
behaviour
Strict Temporal Requirements
Reactive Systems have strict requirements with respect to the rate at which the need to process
input as well as the response time for their outputs
Determinism
The output of these systems are generally determined by the input and the occurrence time, e.g. no
scheduling effects on the temporal properties of the output
Reliability
Reactive Systems are often life/mission critical, as such reliability is of utmost importance
Reactive Systems Key Traits
28. CopyrightPrismTech,2015
Stream Processing is a commonly adopted architectural style for building
systems that operate over continuous (theoretically infinite) streams of data
Stream Processing is often applied in:
- Reactive Systems
- Interactive Systems
- Signal Processing Systems
- Functional Stream Programming
- Big Data Analytics
Stream Processing
29. CopyrightPrismTech,2015
Stream Processing Architectures are ideal for
modelling systems that react to streams of data
produced by the cyber-phisical-systems, such as
data produced by sensors, the stock market, etc.
Stream Processing Systems operate in real-time
over streams and generate in turns other
streams of data providing information on what
is happening or suggesting actions to perform,
such as by stock X, raise alarm Y, or detected
spatial violation, etc.
Stream Processing
30. CopyrightPrismTech,2015
Stream Processing Systems are typically modelled as collection of concurrent modules
communicating via typed data channels
Modules usually play one of the following roles:
- Sources: Injecting data into the System
- Filters/Actors: Performing some computation over sources
- Sinks: Consuming the data produced by the system
Stream Processing
Filter
Source
Filter
Filter
Stream Sink
32. CopyrightPrismTech,2015
A stream represents a potentially infinite sequence of data samples of a given
type T
As such, in Vortex, a stream can be naturally represented with a Topic
Example:
- Topic<PressureType>
- Topic<MarketData>
Streams in Vortex
Filter
Filter
Filter
Stream
33. CopyrightPrismTech,2015
Sources in Vortex are mapped to
DataWriters for a given Topic
Sinks in Vortex are mapped to DataReaders
for a given Topic
Source and Sinks in Vortex
Filter
Source
Filter
Filter
Sink
34. CopyrightPrismTech,2015
Filters implement bit and pieces of the
application business logic
Each Filter has one or more input stream and one
or more output streams
Obviously, a Filter consumes data from an input
stream through a Vortex data reader and pushed
data into an output stream through a Vortex data
writer
Filters in Vortex
Filter
Filter
Filter
36. CopyrightPrismTech,2015
Stream Processing Architectures in general and Vortex-based architecture in
particular promote loose coupling
Anonymous Communication => No Spatial Coupling
Non-Blocking and Asynchronous Interaction => No control or temporal
coupling
In Vortex the only thing that matters is the kind of information a module is
interested in consuming or producing — who is producing the data or when the
data has been produced is not relevant
Loose Coupling
37. CopyrightPrismTech,2015
Location Transparency is an important
architectural property as it makes it possible to
completely decouple the logical decomposition
of a system from its actual physical deployment
Vortex Automatic Discovery brings location
transparency to the next level by enabling
systems that are zero-conf, self-forming and
self-healing
Location Transparency
Module
Module
Module
Module
Module
Module
Physical
Deployment Unit
Logical
Deployment Unit
38. CopyrightPrismTech,2015
Vortex provides full control over temporal
decoupling by means of its Durability Policy
As such the data produced by a Source can be:
- Volatile: available for those that are
around at the time at production
- Transient: available as far as the system/
source is running
- Durable: available forever
Temporal Decoupling
tSource
t
t
Sink
Sink
Volatile Durability
tSource
t
t
Sink
Sink
Transient Durability
39. CopyrightPrismTech,2015
Vortex provides mechanism for detecting traditional
faults as well as performance failures
The Fault-Detection mechanism is controlled by
means of the Vortex Liveliness policy
Performance Failures can be detected using the
Deadline Policy which allows to receive notification
when data is not received within the expected
delays
Failure Detection
Source
tSink
Fault Notification
TD
Source
t
Sink
Performance Failure Notification
P
t
P P
40. CopyrightPrismTech,2015
Vortex provides a built-in fault-masking
mechanism that allow to replicate
Sources and transparently switch over
when a failure occurs
At any point in time the “active” source is
the one with the highest strength. Where
the strength is an integer parameter
controller by the user
Fault-Masking
Source
tSink
Source t
41. CopyrightPrismTech,2015
Vortex provides a built-in fault-masking
mechanism that allow to replicate
Sources and transparently switch over
when a failure occurs
At any point in time the “active” source is
the one with the highest strength. Where
the strength is an integer parameter
controller by the user
Fault-Masking
Source
tSink
Source t
42. CopyrightPrismTech,2015
Performance
Device-2-Device
• Peer-to-Peer Intra-core latency
as low as 8 µs
• Peer-to-Peer latency as low as
30 µs
• Point-to-Point throughput
well over 2.5M msg/sec
Device-2-Cloud
• Routing latency as low as 4 µs
• Linear scale out
• 44K* msgs/sec with a single
router, 4x times more the
average Tweets per second in
the world (~6000 tweets/sec)!
*2048 bytes message payload
43. CopyrightPrismTech,2015
Introduced in 2004, Vortex is an Object
Management Group (OMG) Standard for
efficient, secure and interoperable, platform-
and programming-language independent
data sharing
The Vortex Standard
Vortex has been
recently recommended
as one of the IIoT data-
sharing
platform for the Industrial Internet Consortium
Reference Architecture
45. CopyrightPrismTech,2015
Reactive and Interactive Systems continuously interact with their environment
Stimuli coming from the cyber-physical world are often represented as events to
be handled by the Reactive/Interactive System
- e.g. think about the event from a GUI, or the data available event in Vortex
Events are often handled through some form of callbacks, such as listeners,
functors, etc. This stile of event handling leads to inversion of control
Dealing with Events
46. CopyrightPrismTech,2015
The Inversion of Control induced by callback-based code in known to make
applications brittle and harder to understand
As an mental exercise, think about the call-back style code you’d have to write to
in Java for drawing when the mouse is being dragged. Although the logic is
simple, it is far cry from having a “declarative style”
The problem of callback management, infamously known as Callback Hell, has
become even more important due to the surge of Reactive Systems…
Can we escape the Callback Hell?
The Callback Hell
47. CopyrightPrismTech,2015
Reactive Programming is a paradigm based on a relaxed form of Synchronous
Dataflow Programming
The Reactive Programming is built around the notion of continuous time-varying
values and propagation of change
Reactive Programming facilitates the declarative development of non-blocking
event driven applications — in essence you express what should be done and the
runtime decides when to do it
Reactive Programming was popularised in the context of Functional
Programming Languages by the seminal done by Elliot and Hudak in 1997 as part
of Fran — a framework for composing richly interactive multimedia animations
Reactive Programming
48. CopyrightPrismTech,2015
There is an increasing number of Reactive Programming framework. The Rx
framework introduced by Erik Meijer is becoming a reference
.NET Rx
Arguably the first framework that brought Reactive Programming to the masses.
Available for the .NET platform
ReactiveX (http://reactivex.io)
An Open Source implementation of Rx for the JVM contributed by NetFlix, it
supports, Java, Scala, Kotlin, Clojure, Groovy and JRuby
Reactive Programming Libraries
49. CopyrightPrismTech,2015
ReactiveX is a Java implementation of .NET Reactive Extensions — a library for
composing asynchronous and event-based programs by using observable
sequences.
ReactiveX allow you to compose streams of data declaratively while abstracting
away low level concerns such as threading, synchronization, concurrent data
structures, and non-blocking I/O
The two main concepts at the foundation of ReactiveX are Observables and
Observers
ReactiveX Overview
50. CopyrightPrismTech,2015
ReactiveX Observables allow the composition of flows and sequences of
asynchronous data
You can think of Observables as some kinds of asynchronous collections that
push data to you
In essence an observable can be created from any synchronous or asynchronous
stream of data.
- you can create an observables that represents the data coming into a Vortex Data
Reader, the events from a Button, or even time
- you can also create an observable that represent an actual container
ReactiveX Observables
51. CopyrightPrismTech,2015
Observable Examples
t1 2 3
val
ticks:
Observable[Long]
=
Observable.interval(1
seconds)
n4 50
t1 2 3
val
evens:
Observable[Long]
=
ticks.filter(n
=>
n
%
2
==
0)
n4 50 x x x
t
val
bufs:
Observable[Seq[Long]]
=
evens.buffer(2,1)
n0 4 6 82
52. CopyrightPrismTech,2015
The Observer receives
notification concerning new
values, errors or the completion
of the stream — e.g. no more
data
ReactiveX Observers
trait
Observer[-‐T]
{
def
onNext(value:
T):
Unit
def
onError(error:
Throwable):
Unit
def
onCompleted():
Unit
}
53. CopyrightPrismTech,2015
Beside from the standard ReactiveX/RxScala Observable, each framework that
wants to integrate with ReactiveX/RxScala has to provide its own factory
methods to create observables
For Vortex you have to use the DddsObservable defined as follows:
Vortex Observables
object
DdsObservable
{
def
fromDataReaderData[T](dr:
DataReader[T]):
Observable[T]
def
fromDataReaderEvents[T](dr:
DataReader[T]):
Observable[ReaderEvent[T]]
//
more
methods
which
we’ll
ignore
for
the
time
being
}
55. CopyrightPrismTech,2015
We will use the omnipresent Shapes
Application to illustrate the benefits of
using Reactive Programming with Vortex
Three Topics
- Circle, Square, Triangle
One Type:
Shapes Application
struct ShapeType {
string color;
long x;
long y;
long shapesize;
};
#pragma keylist ShapeType color
Spotted shapes represent subscriptions
Pierced shapes represent publications
56. CopyrightPrismTech,2015
We want to show a triangle that
positioned midway between every
single pair of circle and square of the
same colour
Problem 1: Shape in the Middle
60. CopyrightPrismTech,2015
Midpoint Triangle
val
circles
=
VortexObservable.fromDataReaderData
{
DataReader[ShapeType](Topic[ShapeType](circle))
}
val
squares
=
VortexObservable.fromDataReaderData
{
DataReader[ShapeType](Topic[ShapeType](square))
}
val
ttopic
=
Topic[ShapeType](triangle)
val
tdw
=
DataWriter[ShapeType](ttopic)
//
Compute
the
average
between
circle
and
square
of
matching
color
with
flatMap
val
triangles
=
circles.flatMap
{
c
=>
squares.dropWhile(_.color
!=
c.color).take(1).map
{
s
=>
new
ShapeType(s.color,
(s.x
+
c.x)/2,
(s.y
+
c.y)/2,
(s.shapesize
+
c.shapesize)/4)
}
}
triangles.subscribe(tdw.write(_))
61. CopyrightPrismTech,2015
Using for-comprehension
//
Compute
the
average
between
circle
and
square
of
matching
colour
with
flatMap
val
triangles
=
circles.flatMap
{
c
=>
squares.dropWhile(_.color
!=
c.color).take(1).map
{
s
=>
new
ShapeType(s.color,
(s.x
+
c.x)/2,
(s.y
+
c.y)/2,
(s.shapesize
+
c.shapesize)/4)
}
}
//
Compute
the
average
between
circle
and
square
of
matching
colour
with
for
comprehension
val
triangles
=
for
{
c
<-‐
circles;
s
<-‐
squares.dropWhile(_.color
!=
c.color).take(1)
}
yield
new
ShapeType(s.color,
(s.x
+
c.x)/2,
(s.y
+
c.y)/2,
(s.shapesize
+
c.shapesize)/4)
62. CopyrightPrismTech,2015
Count the number of times a circle bounces on the left or top border and
make the number of bounces proportional to the position of a square of the
same color
Problem 2: Square Race
63. CopyrightPrismTech,2015
Square Race
val circles = DdsObservable.fromReaderData {
DataReader[ShapeType](Topic[ShapeType]("Circle"))
}
val sdw = DataWriter[ShapeType](Topic[ShapeType]("Square"))
val squares = circles.filter(s => {
s.x == 0 || s.y == 0
}).map(s => {
val x = xcoord(s.color) + delta
xcoord(s.color) = x
new ShapeType(s.color, x, ycoord(s.color), size)
})
squares.subscribe(sdw.write(_))
65. CopyrightPrismTech,2015
Spark is a fast and general-purpose — in memory —
cluster computing system that provides high-level
APIs in Java, Scala and Python
Spark has an optimised engine that supports general
execution graphs
Spark supports a rich set of higher-level tools
including Spark SQL for SQL and structured data
processing, MLlib for machine learning, GraphX for
graph processing, and Spark Streaming
Apache Spark
70. CopyrightPrismTech,2015
Vortex can be used as a source
or a sink for Apache Spark
Streams
This allows to seamlessly
integrate Vortex Application
with Spark
… And at the same time
Vortex & Spark