2. What it takes to build Real-Time Operational Intelligence and
Big Data solutions
Row level
data security
manual
development
Massively Parallel
Processing Systems
Example: Vertica,
GreenPlum,
Neteeza,
ParStream
NoSQL Databases
Example:
MongoDB, Amazon
DynamoDB,
Cassandra
Relational
Databases
Example: Microsoft
SQL Server, IBM
DB2, Oracle,
Sybase
OLAP
Example: Microsoft
SSAS, Cognos
Powerplay
NewSQL
Example: NuoDB
Discovery & Analysis
Tableau, QlikView,
Cognos, SiSense
Reporting
(Many)
Event Stream Processing – Developer
focused (Tibco, Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data Mining
R, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software,
Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
Structured
Semi Structured
Unstructured Data Access &
Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data
Denormalization) Manual Custom
Development
Hadoop
Example:
HortonWorks,
Cloudera
Data
Access &
Security
Real-time
and historical
data
publishing -
manual
development
API Data
Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
3. Row level
data security
manual
development
Massively Parallel
Processing Systems
Example: Vertica,
GreenPlum,
Neteeza,
ParStream
NoSQL Databases
Example:
MongoDB, Amazon
DynamoDB,
Cassandra
Relational
Databases
Example: Microsoft
SQL Server, IBM
DB2, Oracle,
Sybase
OLAP
Example: Microsoft
SSAS, Cognos
Powerplay
NewSQL
Example: NuoDB
Discovery & Analysis
Tableau, QlikView,
Cognos, SiSense
Reporting
(Many)
Event Stream Processing – Developer
focused (Tibco, Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data Mining
R, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software,
Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
Structured
Semi Structured
Unstructured Data Access &
Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data
Denormalization) Manual Custom
Development
Hadoop
Example:
HortonWorks,
Cloudera
Data
Access &
Security
Real-time
and historical
data
publishing -
manual
development
API Data
Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
Innovations in Big Data technologies over the last 5 years
4.
5. Row level
data security
manual
development
Massively Parallel
Processing Systems
Example: Vertica,
GreenPlum,
Neteeza,
ParStream
NoSQL Databases
Example:
MongoDB, Amazon
DynamoDB,
Cassandra
Relational
Databases
Example: Microsoft
SQL Server, IBM
DB2, Oracle,
Sybase
OLAP
Example: Microsoft
SSAS, Cognos
Powerplay
NewSQL
Example: NuoDB
Discovery & Analysis
Tableau, QlikView,
Cognos, SiSense
Reporting
(Many)
Event Stream Processing (Tibco,
Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data Mining
R, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software,
Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
Structured
Semi Structured
Unstructured Data Access &
Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data
Denormalization) Manual Custom
Development
Hadoop
Example:
HortonWorks,
Cloudera
Data
Access &
Security
Real-time
and historical
data
publishing -
manual
development
API Data
Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
Challenging bits not addressed in this innovation cycle
This causes:
• Lots of systems integration of
point solutions
• Custom code
• Specialist skills
• Hard to change and evolve
6. Rapidly
industrialize the
use of data by
designing, building
and running real-
time business
intelligence and big
data solutions with
StreamCentral.
Solution Designer
(Data Consumption, data
transformations, conditions,
event, correlation)
Workbench – Easy to Design
Security Designer Systems ManagementAPI Designer
Meta Data Manager
Information Warehouse Manager – Auto Build
De normalized schema
generation for data marts
Security schema generation
Normalized schema
generation for Fact and
Dimensions
Auto generate database design, auto generate database and application code, infer relationships in data
BI Server – Run with scale
Data Processing
Analytic Applications
BI /
Reporting
Data
Exploration /
Viisualization
Functional
Application
Event Driven
Predictive Analytics
Industry
Application
Association
Analysis
Data Collection
Business Event
Detection
Data Publishing - SQL
Server, Vertica,
MongoDB
Data Export Caching
7. Putting it together – High impact real-time solutions in
fraction of the time
StreamCentral
auto builds
security
infrastrucure
Massively Parallel
Processing Systems
Vertica
NoSQL Databases
MongoDB
Relational Databases
Microsoft SQL Server
Discovery & Analysis
Tableau, QlikView,
Cognos, SiSense
Reporting
(Many)
Built in StreamToMe API (Stream any data from any application
or device to StreamCentral)
Static data (Connectors)
Data Mining
R, SAS, SPSS
Custom Applications
Store & Manage Data
Process Data
Structured
Semi Structured
Unstructured Data Access &
Visualization
Acquire Data
Hadoop
Data
Access &
Security
StreamCentral
Built in API
builder
API Data
Export
Database Development - StreamCentral auto generates database design and database code
StreamCentral Workbench – No coding required -- Solution design, rules for data manipulation, rules for monitoring conditions
and KPIs, rules for detecting events... ) – For a broad set of people with varying technical skills
Pre work
Event Stream Processing (No coding)Data Transformation (No coding)
Generate Insights (Correlation, KPIs, Data
Denormalization) (No coding)
StreamCentral
+
Big Data
8. • Massively Parallel Processing
architecture
• Distributed processing
• Scale out and distribute any
component of StreamCentral
independently on commodity
hardware
• Integrates with best of breed
database technologies
Collector
Service
Processing
Service
Business Event
Service
Data
Pubishing
Service
Cache
Service
StreamCentral BI Server Scalability
9. Data available via StreamCentral
Processed
Source Data
• Data Validation
• Association to
entities
• Evaluated for
conditions
• Time and location
standardization
• Custom dimension
standardization
Single Event
Stream
• Correlated data
across multiple
data sources
• Event detection
based on condition
evaluation
Event Analysis
Data Marts
• Data mart built on
highly correlated
data
• Updated real-time
• Analyze multiple
events and
conditions
• Bring together
relevant data
360
o
Analysis
Data Marts
• Data mart build on
loosely correlated
data
• Updated
periodically
• Analyze any data
Real-time Push
Historical Pull
API Access:
Real-time Push
Historical Pull
API Access:
Real-time Push
Historical Pull
API Access:
Historical Pull
API Access:
Database Access:
Historical Pull
Database Access:
Historical Pull
Database Access:
Historical Pull
Database Access:
Historical Pull
10. Example Big Data Solutions: Telco
Telco’s Core IMS
Network Data
Data, Voice & Video
Performance Data
Data, Voice & Video
Performance Data
Data from
Telco Towers
Weather Data
Traffic
IncidentsPopulation
Data
Data Stream
weatherundergrou
nd
MapquestUSA Today Census
data
Sources of real
time streaming
data from
networks,
devices, services
and other
internal
applications
External sources
of data that add
understanding of
what’s happening
when events are
detected
Network
Test
New
Service –
Investment
Planning
Adaptive
Bit Rate –
Video
Streaming
QoE
360o
Customer
QoE for
1st Level
customer
service
Video QoE
for IPTV
Business
Solutions
10
New
revenue
sources from
marketing
operations
Service
Disruption
11. Making changes to definitions
• StreamCentral allows updates to data sources, entities, dimensions, rules for
conditions, event detection rules and data mart definitions
• When changes are made using the Workbench updates the schema change
information in the StreamCentral meta data database. It also makes changes to
the underlying database schema
• Configuration data for all services running within StreamCentral is also in the
distributed cache. The next step is to update this distributed cache. The cache
then notifies the various services of the updates in schema definition
• Correlation and the publishing engine evaluate the schema changes and make
the appropriate changes to their in-memory data before sending the data to the
database
• Roll back is built in to account for errors
12. Many point solutions from multiple
vendors
High learning curve
Maximum time spent integrating
Manual design and coding
Many steps to solution
Older technology
Years to Value
= High Risk,
= High Cost
Agilityinmeetingchangingcustomerneedsinreal-time
Data
Real-timeorHistorical|Streamingorbatch|Structuredorunstructured
Business
Analysis
Detailed
Solution
Design
Manual
Database
Design
Database
Development
CEP -
Development
Platform
Enterprise
Service Bus
Traditional
ETL tools
Application
Development
Workbench – Business Solutions Designer
Consume data, design transformations, conditions, events, analytics,
security, APIs to export and share data
Information Warehouse Manager
Auto generate design, auto generate code, infer relationships, reduce
manual design
BI Server
Built-in Event Processing, high speed data processing, scalable, secure, run
on modern database platforms
Traditional
Pre-work
Data Acquisition, Transformation and
Enrichment
Data Correlation &
Event Mgmt
Analytics & insight
specific data marts
Data Level
Security
Export Enriched Data &
Real Time Analytics
High Automation
No coding required
Contains multiple components that
work together (ETL, CEP, data mart
builder, location intelligence and
more)
Fewer steps to solution
Modern technology
Weeks to Value
= Low Risk,
= Reduced Cost
StreamCentral advantage: Agility to change how you use data in real-time
Risk
Value
Current technology and approach
StreamCentral
Risk
Value
Time
Time
14. Definitions of key concepts in
StreamCentral..
• Entity: An entity represents a group of people or groups of things,
that incoming data is directly connected to. Examples include
departments, customers, site, products etc. By defining entities you
tell StreamCentral how distributed data is connected to things core to
your business
• Data Source: StreamCentral can pull data from a variety of sources
using standard web interfaces and data can also be streamed directly
to StreamCentral API for processing purposes by devices, sensors,
applications and services
• Dimension: Common attributes in a variety of data sources that can
be used to categorize and analyze data
15. Definitions of key concepts in
StreamCentral..
• Conditions: A condition is a rule based measurement that is applied to incoming
data. A condition has three parts to it : The Condition Name (example Voice
Quality), Condition Range (Range of quality from Hard to hear, poor, average, toll
quality, excellent) and Condition KPI (for example a RED KPI would be when the
ranges are Hard to hear and Poor). Individual conditions can be grouped together
in a conditions set which can then be used to detect events as an aggregate
• Events: An event happens when patterns of multiple conditions with specific
ranges from different data streams and environmental data sources are detected
as the data streams in. While StreamCentral allows sophisticated rule based
event detection, it goes further than that. StreamCentral auto builds a data mart
around the event that consists of a variety of context around the event like
entities, environmental data, dimensions and detailed data from data sources
16. 16
Insight
Who (entities like
customer,
patient)
When (time) Where (location)
What (streaming
& static data
correlation)
Generating insights from data requires context to be
added to the data. This context is a continuous
thread that connects all types of data throughout the
BI Solution lifecycle. Four typical examples of
context..
• StreamCentral automatically builds
and maintains time and location
dimensions
• Entities like customer, department,
site can be created and defined in
StreamCentral. Entity data can be
imported for initial load and
continuously kept in sync
• All incoming data in StreamCentral is
continuously and automatically
connected to time, location and
defined entities
• Resultant real-time events and
analytical data marts automatically
inherit this context without need for
any programming or development
work
Converting data to insights by continuously adding
context
17. Types of data sources: Regular
• Data sources used to measure performance
• Examples include data from that will be measured for conditions, ranges and
events
• This data can be connected to entities directly – For example data from a
device can be connected to a customer or sales data can be connected to a
product and a customer
• Can be used in correlation, event detection and data marts
18. Types of data sources : Environmental
• This source of data is used to add context and measure performance – These are
also called environmental data sources
• Example typically include external data that adds context about external factors in play
• Does not have to be connected to the entities directly. StreamCentral will use implicit
relations with time and location dimension to tie environmental data to other enterprise
data. For example, consider an environmental data source called weather. Weather has
location information associated with it. There are two entities namely “Customer” and
“Tower”. Both also have location information associated with them. StreamCentral
standardizes all three to the location dimension but StreamCentral also implicitly connects
Customer to weather and Tower to weather because weather was created as an
environmental data source. Now when analyzing data, StreamCentral will be able to provide
real-time or historical context as to what the weather is where the customer is and what the
weather is where the tower is
• Great to use in data marts for analyzing associations with other data
• Can be used in event detection as part of conditions set and to evaluate events
19. A note on time and location data
• StreamCentral auto creates time and location dimensions.
• Extended data types allow very specific association of a variety of time and
location based attributes
• Data types can be assigned to attributes in entities, regular data sources and
environmental data sources
• For every incoming attribute that is associated with one of the special time or
location data types, StreamCentral looks to see if a specific record for that data
already exists in the dimension. If not, it creates a new record for that value. If it
exists already, then the key value of that data is substituted in the data source
• Time and location data is stored in the database and in the distributed cache
though the real-time lookups are done against the data stored in the cache
• StreamCentral can dynamically feed time or location data to REST or SOAP based
web services from these dimensions
• StreamCentral supports standardizing location data for any geographic level and
supports ability to standardize for specific radius
20. Types of data outputs available from
StreamCentral
• Processed Source Data – Once real-time streaming data or static data via
scheduled pull is received by StreamCentral, it is validated, evaluated for
conditions and associations to entities and dimensions like time and location are
made, the data is available to be published
• Event data – Processed data is evaluated for events. If event is detected then
event data along with its associated context is available as a real-time stream. In
addition, StreamCentral builds a data mart just for this event. Access to historical
data for an event is also available
• Events data mart analysis – Custom data marts that evaluate multiple events and
the conditions that were recorded when the events were detected are available
via events data mart. Historical access is available
• Aggregate 360 degree data mart analysis – Bring disparate data together that is
standardized to common themes and StreamCentral automatically builds a
scalable data mart structure for this data
21. Type of data
available
Real-Time access
method
Historical access method
Processed Source
Data
• ActiveMQ Messages, JMS
based hornetQ, OracleQ,
Microsoft based MSMQ
• WCF based Pub/ Sub model
• Format options - XML/JSON
• REST API – Format options XML/JSON
• Method Name: getFactualData
• Input parameters: source name, filter
parameters (location, time),
numOfRecords
Event Data with
context
• ActiveMQ Messages, JMS
based hornetQ, OracleQ,
Microsoft based MSMQ
• WCF based Pub/ Sub model
• Format options - XML/JSON
• REST API – Format options XML/JSON
• Method Name:getEventData
• Input parameters: Event name or id,
filter parameters (location, time), entity
Id array ,numOfRecords
Events Data Mart • ActiveMQ Messages, JMS
based hornetQ, OracleQ,
Microsoft based MSMQ
• WCF based Pub/ Sub model
• Format options - XML/JSON
• REST API – Format options XML/JSON
• Method Name: getAnalysisData
• Input parameters: analysis collection
name or id, filter parameters (location,
time), entity Id array ,numOfRecords
22. Choosing the right technology for
visualization• Don’t select a delivery technology for these reasons – Best to use StreamCentral
• Centralize business logic in one place – use many tools to deliver the insight
• Definition of KPIs
• Rules for events
• Alias’s for data attributes
• Connectivity and transformation requirements of source data
• Adding context to data
• Select one or more delivery technologies for these reasons
• Performance (in-memory aggregation)
• Cross browser support, support for various tablets and mobile device platforms
• Broad portfolio of charts and visualizations
• Highly interactive
• Ability to be integrated in portals for internal (employees) or external (partners or
customers) consumption
• Standards based like HTML5 and CSS3
• Can be hosted in a SaaS model
24. StreamCentral
Database
Workbench administrator defines roles and
specifies data access rules. Assign users to roles.
StreamCentral builds and manages meta data for
row level access
• Centralize data security with StreamCentral
• Custom applications and analytical/reporting tools only pass
user id as part of their query to StreamCentral database.
• Two types of row level security:
1. Underlying fact data based on dimensions (like time,
location) and entities (like customer, department, site)
2. Denormalized aggregated data based on and/or rules
StreamCentral row level
security layer
Managing row-level data security
25. Factual tables of
StreamCentral Database
Security tables of
Stream Central DB
StreamCentralSecurity
ScrtyRoleID
Role Processing Tables
in StreamCentral
Stream Central Database (MS
SQL / HP Vertica)
StreamCentral Metadata DB
Workbench Administrator will manage
data security by creating data access rule
for Roles and assigning Users to Roles
For data accessed from Stream Central Database via
reporting / analytical Tools or API , Stream Central
will determine the data access permission for that
user
Stream Central
Workbench
26. Distributed Caching
• Storing Time and Location dimension data for fast lookups and data
standardization
• Maintaining configuration information about the system which aids in
managing updates to definitions
• Storing entity data required for adding context to incoming data
• Managing correlation of real-time data
• Managing event detection
• Processed data formatted to data mart specification
• Managing batch data inserts into the database
28. OUTSIDE NETWORK
....
CACHE CLUSTER
Microsoft AppFabric Cache is a distributed caching
technology that allows the cache to be high
available by configuring more than one servers to
participate in storing cache data which is often
called as Cache Cluster.
Software Network Load Balancing (NLB)
Microsoft IIS Web server configured in Software NLB provided by
Microsoft Windows Server allows all Websites to be highly
available.
Microsoft Message Queue persists unread messages in the queue
in the event of sudden server shutdown. The physical hardware is
available for clustering to ensure fail over in case of hardware
failure
Web Application
StreamCentral Public API
Workbench Application
Reports / analytics
Messaging
Inbound Message Queue
Publish Message Queue
Processing Service
Correlation Service
Publish Service
Workbench Database
(StreamCentral MetaData)
StreamCentral Database
(Fact and aggregate data – Vertica/
MS SQL Server)
Processing Engine, Correlation
Engine, Publish Engine can be
made to run on multiple
physical servers to make these
services always highly available.
StreamCentral High Availability
29. Thank you
for your time
Raheel Retiwalla
CTO - Virtus IT Ltd
E: raheel.retiwalla@virtus-it.com
M: +1 617 901 8370
A trusted partner29