SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Enabling Real-Time Analytics
Using Hadoop Map/Reduce
Briefing on New Product Release:
ScaleOut hServer™ V2
October 14, 2013
Bill Bain, CEO (wbain@scaleoutsoftware.com)
David Brinker, COO (daveb@scaleoutsoftware.com)
Copyright © 2013 by ScaleOut Software, Inc.
What’s New Today
ScaleOut hServer V2:
•  World’s first Hadoop MapReduce engine integrated with a
scalable, in-memory data grid
•  Full Hadoop MapReduce support for “live” fast-changing
data
•  20x performance improvement in benchmark tests
•  Significant new technology to simplify development and
maximize ease of use

2

ScaleOut Software, Inc.
About ScaleOut Software
•  Develops and markets software middleware for:
•  Scaling application performance and
•  Performing real-time analytics using
•  In-memory data storage and computing

•  Executive Team:
•  Dr. William Bain, Founder & CEO
•  Career focused on parallel computing – Bell Labs, Intel, Microsoft
•  3 prior start-ups, last acquired by Microsoft and product now ships as
Network Load Balancing in Windows Server

•  David Brinker, COO
•  25 years software business and executive management experience
•  Mentor Graphics, Cadence, Webridge

•  Eight years market experience in Windows & Linux; 400 customers
3

ScaleOut Software, Inc.
ScaleOut Software Products
•  ScaleOut StateServer®

ScaleOut StateServer In-Memory Data Grid

•  In-Memory Data Grid for Windows and
Linux
•  Scales application performance.
•  Industry-leading performance and ease of use

•  ScaleOut GeoServer® adds
•  WAN based data replication for DR
•  Breakthrough technology for global
data access

•  ScaleOut Analytics Server® adds
•  Real-time data analysis for “live” data
•  Comprehensive management tools

•  Introducing ScaleOut hServer™ V2
•  Full Hadoop Map/Reduce engine (20X faster*)
•  Hadoop Map/Reduce on live, in-memory data
4

*in benchmark testing

ScaleOut Software, Inc.

Grid
Service

Grid
Service

Grid
Service

Grid
Service
IMDGs Perform Real-Time Analytics
ScaleOut Analytics Server stores and analyzes “live” data:
•  In-memory storage holds live data sets which are continuously
updated and accessed within operational systems.
•  Examples: stock ticker data, business rules, order & inventory data

•  Integrated analytics engine tracks important patterns & trends.
•  Data-parallel analysis delivers results in msec. to seconds.

5

ScaleOut Software, Inc.
Example in Financial Services
Integrate analysis into a stock trading platform:
•  The IMDG holds market data and hedging strategies.
•  Updates to market data
continuously flow through
the IMDG.
•  The IMDG performs
repeated map/reduce
analysis on hedging
strategies and alerts
traders in real time.
•  IMDG automatically and dynamically
scales its throughput to handle new
hedging strategies by adding servers.
6

ScaleOut Software, Inc.
Customers
• 
• 
• 
• 
• 

400 unique customers
35 Fortune 500 customers
32 countries
9,000 servers licensed
50% have multiple deployments

Gov't)&)
Education
10%

Software
8%

Example Uses
Online loan apps & banking
Portfolio management

Other
3%

Trading systems
Entertain.)&)
Commun.
13%

Travel)&)
Transport.
4%
Ecommerce)
Services
19%

Ecommerce)
Sales
17%

Reservations systems
Financial)&)
Insurance
26%

Ecommerce shopping
Customer service sites
Streaming entertainment
Configuration engines
Gaming

% in $$s

7

ScaleOut Software, Inc.
IMDGs Seeing Wide Adoption
•  In-Memory Data Grids have become
key in several fast-growth markets.
•  Drivers:

Big Data Analytics
$18B 1

•  Cloud computing / virtualization
•  Hardware enablement
•  Competitive pressure

HPC /
Grid
Computing

•  Exploding workloads
•  Big data analysis

•  ScaleOut addresses
scalability and analytics.

8

$25B

ScaleOut Software, Inc.

3

In-Memory
Data Grids
$355M 4

Enterprise
Software
$292B 2

Sources:
1 Wikibon 2013
2 Gartner 2010, rolled fwd to 2013
3 Market Research Media 2015 rolled back to 2013
4. Gartner 2011 rolled fwd to 2013
Analytics Market
Real-time

Batch

“Operational Intelligence”

“Business Intelligence”

Live data sets
Gigabytes to terabytes
In-memory storage
Minutes to seconds
Best uses:

Static data sets
Petabytes
Disk storage
Hours to minutes
Best uses:

•  Tracking live data
•  Immediately
identifying trends
and capturing
opportunities

9

Big Data Analytics
$18B
Real-Time

Batch

Analytics
Server

Hadoop
IBM
Teradata
SAS
SAP

hServer

ScaleOut Software, Inc.

•  Analyzing
warehoused data
•  Mining for longterm trends
ScaleOut hServer Targeted Use Cases
Run continuous Hadoop
on live data, while it’s
being updated.
Accelerate Hadoop on
static data with a one
line code change.

Quickly prototype
Hadoop code.
10

“Capture perishable business
opportunities and identify issues.”
Real-time risk
analysis

Credit card fraud
detection

...

“Speed-up Hadoop execution by >10X for
faster business insights.”
Financial
modeling

Process
simulations

...

“Validate your Hadoop code before it
goes into batch processing.”
No need to install
Hadoop stack
ScaleOut Software, Inc.

Fast-turn debug
and tuning

...
Problem: Hadoop Cannot Efficiently
Perform Real-Time Analytics
•  Typically used for very large, static, offline datasets
•  Data must be copied from disk-based storage (e.g., HDFS)
into memory for analysis.
•  Hadoop Map/Reduce adds lengthy batch scheduling overhead.

11

ScaleOut Software, Inc.
Solution: Integrate Hadoop M/R
into In-Memory Data Grid
Benefits:
•  Enables real-time analysis using Hadoop M/R APIs.
•  Accelerates data access by staging data in memory.
•  Eliminates batch scheduling and data shuffling overheads of
standard Hadoop distributions.
•  Analyzes “live” data.

•  Allows Hadoop
M/R programs to run
without change.
•  Eliminates complexity in
Hadoop deployment.
•  Enables rapid prototyping.
12

ScaleOut Software, Inc.
Introducing ScaleOut hServer™ V2
Enables Hadoop Map/Reduce to perform
real-time analysis:
•  Adds full Map/Reduce engine to SOAS IMDG.
•  Delivers results in msec. to seconds instead of
minutes or hours.
•  Benchmark results show 20X speedup.

•  Has flexible options for data storage/access:
•  Hadoop programs can access/store
key/value pairs using either IMDG or HDFS.
•  Automatically caches HDFS data in IMDG for
fast access.

•  Allows dynamic updates to key/value pairs
during analysis to support “live” data.
•  Ships as open source Java library combined
with SOAS IMDG.
13

ScaleOut Software, Inc.
Enabling Access to IMDG Data
•  ScaleOut hServer adds Grid
Record Reader for accessing
key/value pairs held in the IMDG.
•  Hadoop programs optionally can
output results to IMDG with Grid
Record Writer.
•  Grid Record Reader optimizes
access to key/value pairs to
eliminate network overhead.
•  Applications can access and
update key/value pairs as
operational data during analysis.

14

ScaleOut Software, Inc.
Enabling Fast Access to HDFS Data
•  ScaleOut hServer adds Dataset Record Reader (wrapper) to
cache HDFS data during program execution.
•  Hadoop automatically retrieves data from ScaleOut IMDG on
subsequent runs.
•  Dataset Record Reader
stores and retrieves data
with minimum network
and memory overheads.
•  Tests with Terasort
benchmark have
demonstrated 11X
faster access latency
over HDFS without IMDG.
15

ScaleOut Software, Inc.
ScaleOut hServer Editions
•  Offered in community
and commercial
editions
•  Community Edition
can be used for
evaluation or
production
•  Hybrid open source /
proprietary licensing

Editions
Community Commercial
Up to 4

100s

Expected
data set
size

256GB

GB - TBs

Pricing

Free

Subscription &
perpetual

Support

16

# Servers

Community
Forum

Full
support

ScaleOut Software, Inc.

(max)
Summary
•  IMDGs help scale application performance and analyze “live”
data in real-time.
•  Hadoop focuses on analyzing large, static (offline) datasets
held in file systems.
•  ScaleOut hServer V2 introduces breakthrough technology
enabling Hadoop applications to perform real-time analytics:
•  Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.
•  Accelerates Map/Reduce execution by 20X in benchmark tests.
•  Enables Hadoop applications to analyze “live,” in-memory data.
•  Offers flexible access to both in-memory and file-based data.
•  Eliminates complex Hadoop deployment and tuning.
•  Offers a fast, easy-to-use platform for rapid prototyping.
17

ScaleOut Software, Inc.
Online Systems Need Real-Time Analysis
A
• 
• 
• 
• 
• 

18

few examples:
Equity trading: to minimize risk during a trading day
Ecommerce: to optimize real-time shopping activity
Reservations systems: to identify issues, reroute, etc.
Credit cards: to detect fraud in real time
Smart grids: to optimize power distribution & detect issues

ScaleOut Software, Inc.
Hadoop Users Need
Real-Time Analytics
•  ScaleOut Software conducted informal survey at Strata 2013
Conference (Santa Clara).
•  Based on 150 responses:
•  78% of organizations generate fast-changing data.
•  60% use Hadoop and 78% plan to expand usage of Hadoop within
12 months.
•  Only 42% consider Hadoop to be an effective platform for realtime analysis, but…
•  93% would benefit from real-time data analytics.
•  71% consider a 10X improvement in performance meaningful.

•  Take-away: Hadoop users need real-time analytics.
19

ScaleOut Software, Inc.

Mais conteúdo relacionado

Mais de inside-BigData.com

How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 

Mais de inside-BigData.com (20)

How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

  • 1. Enabling Real-Time Analytics Using Hadoop Map/Reduce Briefing on New Product Release: ScaleOut hServer™ V2 October 14, 2013 Bill Bain, CEO (wbain@scaleoutsoftware.com) David Brinker, COO (daveb@scaleoutsoftware.com) Copyright © 2013 by ScaleOut Software, Inc.
  • 2. What’s New Today ScaleOut hServer V2: •  World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid •  Full Hadoop MapReduce support for “live” fast-changing data •  20x performance improvement in benchmark tests •  Significant new technology to simplify development and maximize ease of use 2 ScaleOut Software, Inc.
  • 3. About ScaleOut Software •  Develops and markets software middleware for: •  Scaling application performance and •  Performing real-time analytics using •  In-memory data storage and computing •  Executive Team: •  Dr. William Bain, Founder & CEO •  Career focused on parallel computing – Bell Labs, Intel, Microsoft •  3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server •  David Brinker, COO •  25 years software business and executive management experience •  Mentor Graphics, Cadence, Webridge •  Eight years market experience in Windows & Linux; 400 customers 3 ScaleOut Software, Inc.
  • 4. ScaleOut Software Products •  ScaleOut StateServer® ScaleOut StateServer In-Memory Data Grid •  In-Memory Data Grid for Windows and Linux •  Scales application performance. •  Industry-leading performance and ease of use •  ScaleOut GeoServer® adds •  WAN based data replication for DR •  Breakthrough technology for global data access •  ScaleOut Analytics Server® adds •  Real-time data analysis for “live” data •  Comprehensive management tools •  Introducing ScaleOut hServer™ V2 •  Full Hadoop Map/Reduce engine (20X faster*) •  Hadoop Map/Reduce on live, in-memory data 4 *in benchmark testing ScaleOut Software, Inc. Grid Service Grid Service Grid Service Grid Service
  • 5. IMDGs Perform Real-Time Analytics ScaleOut Analytics Server stores and analyzes “live” data: •  In-memory storage holds live data sets which are continuously updated and accessed within operational systems. •  Examples: stock ticker data, business rules, order & inventory data •  Integrated analytics engine tracks important patterns & trends. •  Data-parallel analysis delivers results in msec. to seconds. 5 ScaleOut Software, Inc.
  • 6. Example in Financial Services Integrate analysis into a stock trading platform: •  The IMDG holds market data and hedging strategies. •  Updates to market data continuously flow through the IMDG. •  The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time. •  IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers. 6 ScaleOut Software, Inc.
  • 7. Customers •  •  •  •  •  400 unique customers 35 Fortune 500 customers 32 countries 9,000 servers licensed 50% have multiple deployments Gov't)&) Education 10% Software 8% Example Uses Online loan apps & banking Portfolio management Other 3% Trading systems Entertain.)&) Commun. 13% Travel)&) Transport. 4% Ecommerce) Services 19% Ecommerce) Sales 17% Reservations systems Financial)&) Insurance 26% Ecommerce shopping Customer service sites Streaming entertainment Configuration engines Gaming % in $$s 7 ScaleOut Software, Inc.
  • 8. IMDGs Seeing Wide Adoption •  In-Memory Data Grids have become key in several fast-growth markets. •  Drivers: Big Data Analytics $18B 1 •  Cloud computing / virtualization •  Hardware enablement •  Competitive pressure HPC / Grid Computing •  Exploding workloads •  Big data analysis •  ScaleOut addresses scalability and analytics. 8 $25B ScaleOut Software, Inc. 3 In-Memory Data Grids $355M 4 Enterprise Software $292B 2 Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013
  • 9. Analytics Market Real-time Batch “Operational Intelligence” “Business Intelligence” Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses: Static data sets Petabytes Disk storage Hours to minutes Best uses: •  Tracking live data •  Immediately identifying trends and capturing opportunities 9 Big Data Analytics $18B Real-Time Batch Analytics Server Hadoop IBM Teradata SAS SAP hServer ScaleOut Software, Inc. •  Analyzing warehoused data •  Mining for longterm trends
  • 10. ScaleOut hServer Targeted Use Cases Run continuous Hadoop on live data, while it’s being updated. Accelerate Hadoop on static data with a one line code change. Quickly prototype Hadoop code. 10 “Capture perishable business opportunities and identify issues.” Real-time risk analysis Credit card fraud detection ... “Speed-up Hadoop execution by >10X for faster business insights.” Financial modeling Process simulations ... “Validate your Hadoop code before it goes into batch processing.” No need to install Hadoop stack ScaleOut Software, Inc. Fast-turn debug and tuning ...
  • 11. Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics •  Typically used for very large, static, offline datasets •  Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis. •  Hadoop Map/Reduce adds lengthy batch scheduling overhead. 11 ScaleOut Software, Inc.
  • 12. Solution: Integrate Hadoop M/R into In-Memory Data Grid Benefits: •  Enables real-time analysis using Hadoop M/R APIs. •  Accelerates data access by staging data in memory. •  Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions. •  Analyzes “live” data. •  Allows Hadoop M/R programs to run without change. •  Eliminates complexity in Hadoop deployment. •  Enables rapid prototyping. 12 ScaleOut Software, Inc.
  • 13. Introducing ScaleOut hServer™ V2 Enables Hadoop Map/Reduce to perform real-time analysis: •  Adds full Map/Reduce engine to SOAS IMDG. •  Delivers results in msec. to seconds instead of minutes or hours. •  Benchmark results show 20X speedup. •  Has flexible options for data storage/access: •  Hadoop programs can access/store key/value pairs using either IMDG or HDFS. •  Automatically caches HDFS data in IMDG for fast access. •  Allows dynamic updates to key/value pairs during analysis to support “live” data. •  Ships as open source Java library combined with SOAS IMDG. 13 ScaleOut Software, Inc.
  • 14. Enabling Access to IMDG Data •  ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG. •  Hadoop programs optionally can output results to IMDG with Grid Record Writer. •  Grid Record Reader optimizes access to key/value pairs to eliminate network overhead. •  Applications can access and update key/value pairs as operational data during analysis. 14 ScaleOut Software, Inc.
  • 15. Enabling Fast Access to HDFS Data •  ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution. •  Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs. •  Dataset Record Reader stores and retrieves data with minimum network and memory overheads. •  Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG. 15 ScaleOut Software, Inc.
  • 16. ScaleOut hServer Editions •  Offered in community and commercial editions •  Community Edition can be used for evaluation or production •  Hybrid open source / proprietary licensing Editions Community Commercial Up to 4 100s Expected data set size 256GB GB - TBs Pricing Free Subscription & perpetual Support 16 # Servers Community Forum Full support ScaleOut Software, Inc. (max)
  • 17. Summary •  IMDGs help scale application performance and analyze “live” data in real-time. •  Hadoop focuses on analyzing large, static (offline) datasets held in file systems. •  ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: •  Integrates Hadoop Map/Reduce engine with SOAS’s IMDG. •  Accelerates Map/Reduce execution by 20X in benchmark tests. •  Enables Hadoop applications to analyze “live,” in-memory data. •  Offers flexible access to both in-memory and file-based data. •  Eliminates complex Hadoop deployment and tuning. •  Offers a fast, easy-to-use platform for rapid prototyping. 17 ScaleOut Software, Inc.
  • 18. Online Systems Need Real-Time Analysis A •  •  •  •  •  18 few examples: Equity trading: to minimize risk during a trading day Ecommerce: to optimize real-time shopping activity Reservations systems: to identify issues, reroute, etc. Credit cards: to detect fraud in real time Smart grids: to optimize power distribution & detect issues ScaleOut Software, Inc.
  • 19. Hadoop Users Need Real-Time Analytics •  ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara). •  Based on 150 responses: •  78% of organizations generate fast-changing data. •  60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months. •  Only 42% consider Hadoop to be an effective platform for realtime analysis, but… •  93% would benefit from real-time data analytics. •  71% consider a 10X improvement in performance meaningful. •  Take-away: Hadoop users need real-time analytics. 19 ScaleOut Software, Inc.