4. Data Crunching - Use Cases & Latency
www.subex.com
Real Time
(Milliseconds)
Near Real Time
(Seconds)
Micro Batch
(Minutes)
Batch
(Hours-Days)
Latency
Algorithmic Complexity
Reporting
Aggregation
Rule Engine
Profiling
Machine Learning
Audits
Graph/Network Analysis
Text Search
Natural Language Processing
5. Stream Processing & Complex Event Processing (CEP)
Event Processing in the Eventful World
www.subex.com 5
• (Aggregated) Event
Data is
combined/correlated
with
• Users
• Assets
• Threats
• Vulnerabilities
• Location
• Historical
Techniques
• Rule Engine
• Event filtering
• Event aggregation and
transformation
• Operate on stored and
streaming data
• SQL like semantics over
stream data
• Supervised/Unsupervised
machine learning
• Applying known Models
• Event Pattern Detection
• Detecting Event
relationships
Areas
• Real time fraud
detection.
• Real time rating.
• Security Information and
Event Management
• Sensor Data/IOT
• DPI Data – Metadata,
Content,Flow
correlation.
• M2M Data
• Data Fraud – Malware
• Transaction Risk Scoring
6. Stream processing
• Keep the data Moving (Low Latency) – In Memory
• Distributed Message Queues
• Distributed In Memory Caches
• Distributed In Memory Stores
• Scalable, Highly Available Distributed stream Processing(Partition Data &
Scale, Data safety & Highly Available)
• Handle Stream Imperfections( Delayed, Missing, Out-Of-Order Data)
Key considerations
www.subex.com 6
8. Rule n
Rule Engine – In Memory Aggregation
www.subex.com 8
Event / I/P Data Record
Rule 2
Rule 1
Event Filters
Filtered Records
Aggregation Layer
Condition Evaluation
Actions
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
9. www.subex.com 9
Data placement Strategies
Application Data
• Application configuration data– Rule libraries
,DNA Configurations, Configurations – MySQL.
• Application generated data – Alarms,
Discrepancies – MySQL
• Operations Data (Application generated , Infra
Monitoring ) – Logs , Audit ,Metrics – Solr
• Application Aggregations - Summary/Pre-
aggregated data – Hive Tables
• Statistical Profiles, In Memory aggregation files –
HDFS
Traditional Telco Data
• Telco Entity Data – With Update Semantics –
HBase/MySQL
• Telco Historic Transaction Data – Hive with ORC file
format Partitioned by Date Stored in HDFS
• Switch Input Raw Files –HDFS
Other Sources
• Social Media
• DPI Flow Data
• Location Data
• IOT Sensor Data
10. Spark Streaming
Application Data
Data Flow
www.subex.com 10
Landing Directory
SAN/HDFS Apache
Flume
Flume –
Spark
Sink
Apache Kafka
In Memory
Rule Engine
Analytics
Application
s
…
Apache Spark
Streaming
ETL Adaptors
Flume – Dir
Source
Message
Queue
Flume –Kafka
Source
DB Sources
Sqoop/CDC
Tools
HDFS – Raw
File Backup
HDFS Hive Tables Hbase Tables Solr - Search
Indexes
Audits
MySQL–
Ref DB
HDFS
Hive Tables
Hbase Tables
Dist Message Queue
Data
Lake
Submit Spark
Jobs
Data Access
Hive/Presto
Distributed Cache
Operational
Metrics
Data Load
Stage
O
M
Spark Streaming
O
M
Pre-
aggregation
11. Data Management
Data Platform – Business and Domain Packaging
11
Data Acquisition/Ingest
Data Federation F/W
Data Processing
PreAggregation
Distributed Stream
Processing Apache Spark
Data Visualization & Analysis
Mobile F/W
ROC View
Case Management
Standard APIs – EAI & WS
Analytics Engine
Reconciliation Engine BPM-
Workflow
Engine
Flexible ETL
Rule Processing - In
Memory
Common Data Model
DistributedCache
Control
Panel
Operations &
Admin
Resource
Mgmt
Data Security
Audit &
Logging
Scheduler
Network Analysis
ROC Insights
Real time Message based
Distributed
MessageQueue
Hadoop – HDFS, Hive , HBase
Multi -tenancy
Machine Learning
Enterprise Search
Real time
Continuous
Query - CEP
Document Store Graph Data Store
Authoriza
tion &
Authentic
ation
Real time Rating
Profiling
Cloud
Metering
Risk Scoring
Cloud connectors
API Mgmt
Infrastructure
On premise OS/Servers/Network/StorageIaaS(Public /Private cloud)
ESB
Analytic Models