Do you need to move enterprise database information into a Data Lake in real time, and keep it current? Or maybe you need to track real-time customer actions in order to engage them while they are still accessible. Perhaps you have been tasked with ingesting and processing large amounts of IoT data.
The Ultimate Guide to Choosing WordPress Pros and Cons
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
1. Making IMC Enterprise Grade
Steve Wilkes – Striim Founder / CTO
See all the presentations from the In-Memory Computing Summit
at http://imcsummit.org
2. What is Enterprise Grade?
Scalability Reliability
Security Integration
Enterprise
Grade
3. Scalability
"Scalability is a characteristic of a
system that describes its capability
to cope and perform under an
increased or expanding workload"
Scalability in IMC:
• Ingestion volume
• Processing
• In-Memory Data
• Stored Data
Scalability Reliability
Security Integration
EnterpriseGrade
4. Reliability
"Reliability is the ability of a system
to consistently perform its intended
or required function, on demand
without degradation or failure."
Reliability in IMC:
• Ingestion
• Processing
• Results
• Exactly Once
Scalability Reliability
Security Integration
EnterpriseGrade
5. Security
"Security is the mechanism by which
a system is protected from data
corruption, destruction, interception,
loss, or unauthorized access"
Security in IMC:
• Authentication
• Authorization
• Protection
• Encryption
Scalability Reliability
Security Integration
EnterpriseGrade
6. Integration
"Integration is the bringing together
of component subsystems into one
system and ensuring that the
subsystems function together."
Integration in IMC:
• Ingestion
• Enrichment
• Processing
• Delivery
Scalability Reliability
Security Integration
EnterpriseGrade
7. Databases &
Data Warehouses
Messaging
Big Data &
NOSQL
Cloud
Files
The Striim Platform
End-to-End Distributed IMC Platform with Continuous
Ingest, Processing, Enrichment, Analysis, Delivery,
Alerting, and Visualization of Streaming Data
Databases
Log files
Sensors
Messaging
Alerts
Results
Real-time
Dashboards
CorrelationDetection
STREAMING
INTELLIGENCE
External
Context
Filtering Enrichment
Aggregation
Transformation
Windowing
Continuous
Queries
STREAMINGINTEGRATION
Streaming
CDC
Parallel Log
Collection
Edge
Processing
Continuous
Event
Collection
Matching
Triggers
8. Part of Overall Data Architecture
Multiple data
sources
Hadoop (HDFS)
(Existing)
ETL Jobs
Streaming
Integration
Batch/
High-Latency
Realtime/
Low-Latency
ODS/ EDW
Real-Time
Applications
Legacy
Applications
Spark Hive
Big Data
Applications
Users
13. Scalability
• Processing
– Scale via stream partitioning
– Queries become byte code
– Queries multi-threaded
– Events routed to cached data
– Partitionable in-memory windows
• Results Storage
– Scale via partitioning
– Replicated
– Pre-Indexed
– Parallel queries
– Sub-clustering
Results
nod
e A
nod
e B
nod
e C
nod
e D
nod
e E
nod
e F
1, 4, 7
2, 5, 8
3, 6 ,9
T
Q
L
CQ
(Bytecode)
Events or
Batches
SELECT *
FROM stream s,
cache c
WHERE s.id = c.id
Cache
Enriched
Events
15. Reliability
• Data Ingest
– Rewind sources on failure
– Utilize persistent messaging
for non-rewindable sources
• High Speed Messaging
– Repartition on failure
• Persistent Messaging
– Data sent in 'sync' mode
– Data replication ensured
– Read and written events are checkpointed
– Replay from last checkpoint on failure
…
Collection
Agents
Processing
Cluster
Events Partitioned Over Cluster
Persistent Messaging
Repartition on Failure
16. Reliability
• Metadata / Control IMDG
– Inherently replicated
– Watch for node failure
– Detect application failure
– Failover of services
• Context IMDG
– Repartition on failure
– Rebalance on node addition
– Replicas ensure continued operation
a d g
b e h
c f i
a f
b g
c f
d i
e
a e i
b f
c g
d h
17. Reliability
• Processing
– Recovery restores window content
– Replay from checkpoint ensures
exactly once processing
– Queries repartitioned on failure
– Data exceptions handled and
written to separate stream
• Results Storage
– Exactly once results on failure
– Replicas ensure restore on failure
– Resharded on cluster changes
source
window2 CQ WS
node1 Checkpoint Manager
Metadata
Repository
11
11
window2 CQ WS
node2 Checkpoint Manager
14
window2 CQ WS
Node3 Checkpoint Manager
16
29
stream stream
stream stream
stream stream
13 1112
14
11,21,32
31,20,14
29,25,16
11,21,32
31,20,14
29,25,16
19. Security
• Data Ingest
– Secure any passwords
– Prevent unauthorized access
to sources
• High Speed Messaging
– Encrypt data on the wire
– Prevent unauthorized access
• Persistent Messaging
– Encrypt data on the wire
– Prevent unauthorized access
to stored persisted streams
20. Security
• Metadata / Control IMDG
– Authentication / Authorization
– Integrate with enterprise
– Roll-based access
– Fine-grained control
• Context IMDG
– Secure any passwords
– Prevent unauthorized access
✓
✗
✓
✓
✓
21. Security
• Processing
– Secure access to in-memory
data structures and streams
– No intermediate data staging
• Results Storage
– Prevent direct access to results
– Secure access through roles
• Overall
– Use a single authentication scheme
– Define permissions to cover all aspects
StreamsSources
Caches
Processin
g
UIResults
Persisten
t
Streams
23. Integration - Ingestion
Message Queues / Kafka Inherently Streaming
Sensors / Devices Might Need Edge Processing
Files Need Continuous Parallel Collection
Databases Can't Use SQL For Data
Streaming Data Collection Allows Data to Move at its Own Speed
Including Non-Traditional un/semi-structured data
Databases Need Change Data Capture (CDC)
24. Integration – Processing
Filter Out Unnecessary Data
Transform to the Format You Need
Aggregate to Remove Redundancy
and Obtain Trends Over Time
Integrate Existing Processing Through
Java Functionality
25. Integration - Delivery
Databases / ODS / EDWFiles For Up-Stream ProcessingMessage Queues / Kafka for Data As a ServiceCloud for Elastic Storage and ScalabilityHadoop / NOSQL for Data Lake
27. Integration – User Experience
Design Flows Analyze Deploy
Visualize Monitor
UI Fully Integrated with Clustered
Back-End Collection and Processing
28. Multiple Common Use Cases
Collecting / Analyzing Database Change in Real-Time
Preventing Fraud or Unusual Behavior
Monitoring Infrastructure, Equipment, or Replication
Enhancing Customer Experience
Ensuring SLAs
Handling Huge Amounts of IoT Data
Reliably Provide Current, Accurate and Complete Decision Data
30. Why we are cool
"Striim's product enables mainstream organizations to
productively introduce IMC enabled innovation…"
"…through a single, consistent, easy-to-use and
enterprise-class IMC-enabled platform."
31. Two Things to Remember
Enterprise Grade Means
Scalable, Reliable, Secure
& Integrates Well With
Existing Resources
32. Two Things to Remember
Streaming Integration
should be part of
your Enterprise
Data Strategy