Welcome to real-time analytics for Hadoop! ScaleOut hServer V2 is the world's first in-memory execution engine for Hadoop MapReduce. Now you can analyze live data using standard Hadoop MapReduce code, in memory and in parallel without the need to install and manage the Hadoop stack of software. (Only one small change is needed to your Hadoop program.) Gone are disk I/O latencies, slow start-up times, and software environment management headaches. Benchmark tests have demonstrated 20x faster execution time over the Apache Hadoop distribution. Now you can use Hadoop MapReduce in live applications in financial services, e-commerce, logistics, and countless other scenarios where results are needed in seconds instead of minutes or hours.
Learn more: http://www.scaleoutsoftware.com/products/scaleout-hserver
Watch the presentation video: http://inside-bigdata.com/2013/10/15/enabling-real-time-analytics-using-hadoop-mapreduce/
2. What’s New Today
ScaleOut hServer V2:
• World’s first Hadoop MapReduce engine integrated with a
scalable, in-memory data grid
• Full Hadoop MapReduce support for “live” fast-changing
data
• 20x performance improvement in benchmark tests
• Significant new technology to simplify development and
maximize ease of use
2
ScaleOut Software, Inc.
3. About ScaleOut Software
• Develops and markets software middleware for:
• Scaling application performance and
• Performing real-time analytics using
• In-memory data storage and computing
• Executive Team:
• Dr. William Bain, Founder & CEO
• Career focused on parallel computing – Bell Labs, Intel, Microsoft
• 3 prior start-ups, last acquired by Microsoft and product now ships as
Network Load Balancing in Windows Server
• David Brinker, COO
• 25 years software business and executive management experience
• Mentor Graphics, Cadence, Webridge
• Eight years market experience in Windows & Linux; 400 customers
3
ScaleOut Software, Inc.
4. ScaleOut Software Products
• ScaleOut StateServer®
ScaleOut StateServer In-Memory Data Grid
• In-Memory Data Grid for Windows and
Linux
• Scales application performance.
• Industry-leading performance and ease of use
• ScaleOut GeoServer® adds
• WAN based data replication for DR
• Breakthrough technology for global
data access
• ScaleOut Analytics Server® adds
• Real-time data analysis for “live” data
• Comprehensive management tools
• Introducing ScaleOut hServer™ V2
• Full Hadoop Map/Reduce engine (20X faster*)
• Hadoop Map/Reduce on live, in-memory data
4
*in benchmark testing
ScaleOut Software, Inc.
Grid
Service
Grid
Service
Grid
Service
Grid
Service
5. IMDGs Perform Real-Time Analytics
ScaleOut Analytics Server stores and analyzes “live” data:
• In-memory storage holds live data sets which are continuously
updated and accessed within operational systems.
• Examples: stock ticker data, business rules, order & inventory data
• Integrated analytics engine tracks important patterns & trends.
• Data-parallel analysis delivers results in msec. to seconds.
5
ScaleOut Software, Inc.
6. Example in Financial Services
Integrate analysis into a stock trading platform:
• The IMDG holds market data and hedging strategies.
• Updates to market data
continuously flow through
the IMDG.
• The IMDG performs
repeated map/reduce
analysis on hedging
strategies and alerts
traders in real time.
• IMDG automatically and dynamically
scales its throughput to handle new
hedging strategies by adding servers.
6
ScaleOut Software, Inc.
7. Customers
•
•
•
•
•
400 unique customers
35 Fortune 500 customers
32 countries
9,000 servers licensed
50% have multiple deployments
Gov't)&)
Education
10%
Software
8%
Example Uses
Online loan apps & banking
Portfolio management
Other
3%
Trading systems
Entertain.)&)
Commun.
13%
Travel)&)
Transport.
4%
Ecommerce)
Services
19%
Ecommerce)
Sales
17%
Reservations systems
Financial)&)
Insurance
26%
Ecommerce shopping
Customer service sites
Streaming entertainment
Configuration engines
Gaming
% in $$s
7
ScaleOut Software, Inc.
8. IMDGs Seeing Wide Adoption
• In-Memory Data Grids have become
key in several fast-growth markets.
• Drivers:
Big Data Analytics
$18B 1
• Cloud computing / virtualization
• Hardware enablement
• Competitive pressure
HPC /
Grid
Computing
• Exploding workloads
• Big data analysis
• ScaleOut addresses
scalability and analytics.
8
$25B
ScaleOut Software, Inc.
3
In-Memory
Data Grids
$355M 4
Enterprise
Software
$292B 2
Sources:
1 Wikibon 2013
2 Gartner 2010, rolled fwd to 2013
3 Market Research Media 2015 rolled back to 2013
4. Gartner 2011 rolled fwd to 2013
9. Analytics Market
Real-time
Batch
“Operational Intelligence”
“Business Intelligence”
Live data sets
Gigabytes to terabytes
In-memory storage
Minutes to seconds
Best uses:
Static data sets
Petabytes
Disk storage
Hours to minutes
Best uses:
• Tracking live data
• Immediately
identifying trends
and capturing
opportunities
9
Big Data Analytics
$18B
Real-Time
Batch
Analytics
Server
Hadoop
IBM
Teradata
SAS
SAP
hServer
ScaleOut Software, Inc.
• Analyzing
warehoused data
• Mining for longterm trends
10. ScaleOut hServer Targeted Use Cases
Run continuous Hadoop
on live data, while it’s
being updated.
Accelerate Hadoop on
static data with a one
line code change.
Quickly prototype
Hadoop code.
10
“Capture perishable business
opportunities and identify issues.”
Real-time risk
analysis
Credit card fraud
detection
...
“Speed-up Hadoop execution by >10X for
faster business insights.”
Financial
modeling
Process
simulations
...
“Validate your Hadoop code before it
goes into batch processing.”
No need to install
Hadoop stack
ScaleOut Software, Inc.
Fast-turn debug
and tuning
...
11. Problem: Hadoop Cannot Efficiently
Perform Real-Time Analytics
• Typically used for very large, static, offline datasets
• Data must be copied from disk-based storage (e.g., HDFS)
into memory for analysis.
• Hadoop Map/Reduce adds lengthy batch scheduling overhead.
11
ScaleOut Software, Inc.
12. Solution: Integrate Hadoop M/R
into In-Memory Data Grid
Benefits:
• Enables real-time analysis using Hadoop M/R APIs.
• Accelerates data access by staging data in memory.
• Eliminates batch scheduling and data shuffling overheads of
standard Hadoop distributions.
• Analyzes “live” data.
• Allows Hadoop
M/R programs to run
without change.
• Eliminates complexity in
Hadoop deployment.
• Enables rapid prototyping.
12
ScaleOut Software, Inc.
13. Introducing ScaleOut hServer™ V2
Enables Hadoop Map/Reduce to perform
real-time analysis:
• Adds full Map/Reduce engine to SOAS IMDG.
• Delivers results in msec. to seconds instead of
minutes or hours.
• Benchmark results show 20X speedup.
• Has flexible options for data storage/access:
• Hadoop programs can access/store
key/value pairs using either IMDG or HDFS.
• Automatically caches HDFS data in IMDG for
fast access.
• Allows dynamic updates to key/value pairs
during analysis to support “live” data.
• Ships as open source Java library combined
with SOAS IMDG.
13
ScaleOut Software, Inc.
14. Enabling Access to IMDG Data
• ScaleOut hServer adds Grid
Record Reader for accessing
key/value pairs held in the IMDG.
• Hadoop programs optionally can
output results to IMDG with Grid
Record Writer.
• Grid Record Reader optimizes
access to key/value pairs to
eliminate network overhead.
• Applications can access and
update key/value pairs as
operational data during analysis.
14
ScaleOut Software, Inc.
15. Enabling Fast Access to HDFS Data
• ScaleOut hServer adds Dataset Record Reader (wrapper) to
cache HDFS data during program execution.
• Hadoop automatically retrieves data from ScaleOut IMDG on
subsequent runs.
• Dataset Record Reader
stores and retrieves data
with minimum network
and memory overheads.
• Tests with Terasort
benchmark have
demonstrated 11X
faster access latency
over HDFS without IMDG.
15
ScaleOut Software, Inc.
16. ScaleOut hServer Editions
• Offered in community
and commercial
editions
• Community Edition
can be used for
evaluation or
production
• Hybrid open source /
proprietary licensing
Editions
Community Commercial
Up to 4
100s
Expected
data set
size
256GB
GB - TBs
Pricing
Free
Subscription &
perpetual
Support
16
# Servers
Community
Forum
Full
support
ScaleOut Software, Inc.
(max)
17. Summary
• IMDGs help scale application performance and analyze “live”
data in real-time.
• Hadoop focuses on analyzing large, static (offline) datasets
held in file systems.
• ScaleOut hServer V2 introduces breakthrough technology
enabling Hadoop applications to perform real-time analytics:
• Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.
• Accelerates Map/Reduce execution by 20X in benchmark tests.
• Enables Hadoop applications to analyze “live,” in-memory data.
• Offers flexible access to both in-memory and file-based data.
• Eliminates complex Hadoop deployment and tuning.
• Offers a fast, easy-to-use platform for rapid prototyping.
17
ScaleOut Software, Inc.
18. Online Systems Need Real-Time Analysis
A
•
•
•
•
•
18
few examples:
Equity trading: to minimize risk during a trading day
Ecommerce: to optimize real-time shopping activity
Reservations systems: to identify issues, reroute, etc.
Credit cards: to detect fraud in real time
Smart grids: to optimize power distribution & detect issues
ScaleOut Software, Inc.
19. Hadoop Users Need
Real-Time Analytics
• ScaleOut Software conducted informal survey at Strata 2013
Conference (Santa Clara).
• Based on 150 responses:
• 78% of organizations generate fast-changing data.
• 60% use Hadoop and 78% plan to expand usage of Hadoop within
12 months.
• Only 42% consider Hadoop to be an effective platform for realtime analysis, but…
• 93% would benefit from real-time data analytics.
• 71% consider a 10X improvement in performance meaningful.
• Take-away: Hadoop users need real-time analytics.
19
ScaleOut Software, Inc.