At the StampedeCon 2013 Big Data conference in St. Louis, Vinod Vydier, Middleware Specialist at Oracle, discussed Real Time Event Processing and In-memory analysis of Big Data. There are multiple projects (for example Cloudera’s Impala) that do realtime or near real time analysis of Big Data. However, if there are events that need to be looked at and responded to in real time (for example credit card fraud or a vehicle metrics to alert a driver) this can have a significant impact on data collection and analysis using the traditional Big Data techniques. In this session, I will introduce strategies on how you can use an Event Processing Engine to respond to events in real time, and then filter and categorize data in memory before storing data in HDFS. This will make it easier to run Hadoop jobs on the data collected, and also have end clients respond to the critical events in real time.
How to Troubleshoot Apps for the Modern Connected Worker
Real Time Event Processing and In-memory analysis of Big Data - StampedeCon 2013
1. Event Processing for better
(Big) Data
Vinod Vydier
Middleware Specialist @ Oracle
2. Agenda
§ Why use event processing?
§ Event Processing Applications
§ Technical Architecture
§ Use of In-Memory data-grid
§ Use cases
3. Challenges Working with Big Data
• Storing Data has becoming cheap, however the
storage is not infinite and has to be managed to make
use of the data effectively.
• Hadoop has inherent latency for responding to real
time events (which can produce high volume data at
high velocity) and typically involves real responses.
• Event Processing helps in getting clean data with
context and less redundancy into HDFS, so the Hadoop
jobs can be more effective.
• Event Processing helps in responding back in real
time, and storing the data in HDFS for better historical
analysis.
4. Why use Event Processing Infrastructure
Application has any one or more of the following
conditions:
§ Requires high throughput and low latency
processing.
§ Has continuously streaming data.
§ Real-time correlation between multiple incoming data
sources.
§ Time-sensitive alerts, aggregations and calculations.
§ Needs to look for patterns in the data stream.
§ Data does not need to be stored, if there is nothing
of interest in it.
§ Problem is more easily solved by analyzing before
storing in HDFS.
5. Filtering, Real-time Intelligence for Big Data
VOLUME VELOCITY VARIETY VALUE
SOCIAL
BLOG
SMART
METER
101100101001
001001101010
101011100101
010100100101
FAST DATA
Event Processing Intelligence
GREATER
6. Stay ahead of Big Data
Filter out,
correlate
Move time-critical analysis to front of process
• Filter out noise (example: data ticks with no
changes), add context (by correlating multiple
sources), increase relevance.
• Identify certain critical conditions as you insert data
into the warehouse.
7. Getting ahead of the curve: Fast Data
Big Data
minutesms
Fast Data
Historicaldepth:deep
Historicaldepth:shallow
Example:
analysis of traffic
patterns and
congestion times for
urban planning
Example:
monitoring of traffic
cameras to ensure given
license plates are not in
use on multiple vehicles
Add “depth” to your fast data by
merging output of MapReduce to
stream processing
9. Event Processing inputs
Ø Streams
Ø Continuous input, often in high-
volume
Ø Time ordered
Ø Does not end
Ø Impossible to process / analyze in
real-time with traditional relational
database systems
Example: Raw Sensor Event
streams, GPS, Market Data Feeds
BA BOEING D 77.575 800 20080305 10:03:02:78
DO DUPOD
NT
D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
C CITIGROUP D 34.125 2000 20080305 10:03:03:05
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
C CITIGROUP D 34.125 2000 20080305 10:03:03:05
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
C CITIGROUP D 34.125 2000 20080305 10:03:03:05
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
Event Processing provides a new data
management infrastructure to support and
analyze Streams in real-time
BA BOEING D 77.575 41.575
800
20080305 10:03:02:78
DO DUPONT D 41.575 3000 20080305 10:03:04:12
BA BOEING D 77.575 800 20080305 10:03:02:78
C CITIGROUP D 34.125 2000 20080305 10:03:03:05
BA BOEING D 77.575 800 20080305 10:03:02:78
10. Filtering
Ø New stream filtered for specific criteria,
e.g. stock price > $22
Ø Correlation & Aggregation
Ø Scrolling, time-based window metrics,
e.g. average # of stock trades in the
last hour
Ø Pattern Matching
Ø Notification of detected event patterns,
e.g. price changes A, B and C occurred
within 15 minute window
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
……
• Event Processing done in-Memory (not in Database)
• Logic is defined through Continuous Queries on the data
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
CAT CATERPILLAR D 22.5 600 20080305 10:03:03:46
DO DUPONT D 41.575 3000 20080305 10:03:04:12
AA ALCOA INC D 20.125 1000 20080305 10:03:01:55
AXP AMER EXPRESS CO D 45.875 500 20080305 10:03:02:10
BA BOEING D 77.575 800 20080305 10:03:02:78
BA BOEING D 77.575 41.575
800
20080305 10:03:02:78
DO DUPONT D 41.575 3000 20080305 10:03:04:12
COMPLEX QUERIES
Event Processing outputs
11. Data crunching for Event Processing done in a
in-memory data grid
• High throughput for storing data
• Aggregation and event querying
• Pattern implementation flexibility combining complementary
technologies
• Handle and correlate events in real time, including support for
multiple patterns:
• Pre processing (buffer inputs)
• In Event Processing (to cache reference data)
• Post Processing (to expose processed events to consuming
apps)
Data Grid
Event Processing
Consolidat
ed & in-
context
Data
Filtered/
Aggregat
ed Data
HDFS and
traditional storage
12. In-memory events on the data stream
n Threshold Management
n Detecting threshold conditions across multiple
event streams
n Using cache to:
n Allow dynamic configuration of thresholds
n Add (via join) contextual data to support
aggregation
n Using pattern matching to find sustained
conditions
n Alert Generation
n Using relations to represent state and state
transitions
n Using “missing event” patterns to monitor
expected response(s)
n Alarm Management
n Using pattern matching to remove extraneous
alarm events
n e.g. power off alarm preceded by tamper alarm
within (n) minutes
X
14. Visualizing events on the data stream
JMS
Resource Locations
Matches and Alerts
SQL
Event Processing Application
JMS
Geo-Fencing Definitions
SQL
MapViewer
Manager
15. JMS Protocol Integration
n Common integration touch point with Service Bus
n Business Activity Monitoring integration
HTTP Publish/Subscribe
n Support pub/sub events between Event server and
web clients.
n Clients don’t need to poll for updates (unlike
traditional HTTP).
n Clients subscribe to and publish to event channels.
n Bayeux protocol
n Light weight and the payload is JSON
Visual/SOA integration with Event Processing
16. Event Processing High Level Architecture
JSON
Adapter CacheProcessor POJO
EPN (Event Processing Network) Elements
HTTP Pub/S
18. Event Driven SOA: Simplify Business
Complexity
• Real-time business insight
• Preempt and react instantaneously to Enterprise, Environmental and Global
Business conditions
• Gain business insight using previously untapped, raw event sources
• Hot-pluggable integration
• Transparent SOA infrastructure interoperability
• Distributed, deployment ready, pre-integrated, in-memory Data Grid,
and Java low latency determinism.
• Lightweight high performance Java Event Server platform
• Real-time business friendly analyst oriented
visualization layers
• Powerful, extensible Event Processing Analysis abstraction
• Business user dashboards
• Business user domain specific natural language layers
• Real-time predictive analytics
19. Event Processing use cases in different
industries
1. Customer Experience
2. Transportation, Logistics & Fleet Management
3. Utilities: Demand & Response, Smart Meter
4. Public Sector: Emergency Response,
Intelligence
5. Telcos: Real Time billing & WiFi offloading,
Mobile billboard
20. Customer Experience
n Industry focus on new buzzword: Customer
Experience
n Desire to harness potential of social networks for
better targeted marketing
Event Processing can help with:
n Monitoring in real-time customer activity (social
networks, location (e.g. proximity to stores, etc) and
identifying opportunities in real-time
n Correlating with existing information (customer/
shopping profiles, etc.)
n Generating real-time alerts
21. Transportation, Logistics and Fleet Management
n Constant industry pressure for greater
efficiency
n Need to differentiate through premium
services and greater reliability and visibility
n Availability of cheap wireless sensors
(temperature, GPS, etc.) that can be included
in packages/containers/trucks
Event Processing can help with:
n Real-time monitoring of inflow of data from
sensors
n Trends detection / prediction (to rise, etc.)
n Leveraging spatial/geo-location capabilities.
22. Utilities
n Adoption of Smart Meters: concerns about bandwidth/ processing
power required to handle the information they generate, desire to offer
value-add services
n Ever increasing electricity demand
n Demand for real-time billing & analytics
n Greater customer expectations re: outage & response times
n Regulations
Event Processing can help with:
n Alerting of consumption trends in real-time, enabling “Demand/
Response”
n Real-time detection of problems (abnormal spikes in consumption
indicative of leaks, etc.)
n Filtering out redundant or nested (ex: tree fell on the line) outage
errors and problems
n Tracking of resources and personnel
23. Telco
n Overloaded data networks and new strategies to offload traffic:
real-time billing vs. unlimited, offloading to WiFi, degradation of
service from 4G to 3G, etc.
n GPS-enabled phones offer new location-based marketing
opportunities: “mobile billboards”
How can Event Processing help:
n Event Processing infrastructure can handle massive amounts
of data generated by mobile devices, filter out, correlate and
aggregate in real-time to only retain valuable information
n Event Processing can plug into all types of feeds, from devices
to social networks
n Event Processing can be integrated with spatial and geo-
location technology to send location specific data to the user.
24. Public Sector
n Heightened security requirements
n Ever increasing population in urban areas drives optimization
requirements
n Increasing number of real-time data: video feeds, GPS data,
traffic data, etc.
n Applications: Security Intelligence, geo-fencing, “Smart
Cities”, traffic control, gateless tolls
How Event Processing can help:
n Event Processing can be integrated with spatial and geo-
location technology to track location specific data with a user.
n Event Processing can plug in any data feed such as video /
face recognition
n Event Processing meets performance & availability
requirements in this space