SlideShare uma empresa Scribd logo
1 de 30
Performing Real-Time Analytics
with In-Memory Data Grids
Copyright © 2013 by ScaleOut Software, Inc.
Cloud Expo
June 10, 2013
Mikhail Sobolev (sobolev@scaleoutsoftware.com)
David Brinker (daveb@scaleoutsoftware.com)
2 ScaleOut Software, Inc.
• What is an In-Memory Data Grid (IMDG)?
• Top Benefits of IMDGs
• The Need for Real-Time Analytics
• Example: A Platform for Managing Hedging Strategies
• Using an IMDG to Perform Real-Time Analysis
• Benchmark Results
• Integrating an IMDG into Hadoop
2
Agenda
3 ScaleOut Software, Inc.
• Dr. Mikhail Sobolev, Lead Java Architect
• Ph.D. from Moscow Institute of Physics and Technology
• Research and consulting focus in parallel computing
• Responsible for development of scalable software services in Java
• David Brinker, COO
• 20 years software business and executive management experience
• Mentor Graphics, Cadence, Webridge
• Company: ScaleOut Software
• Develops and markets IMDG products
• Founded in September 2003
• Offices in Bellevue, WA and Beaverton, OR
• Eight years market experience in Windows
& Linux
About the Speakers
4 ScaleOut Software, Inc.
• ScaleOut StateServer®
• Flagship product
• IMDG middleware for Windows
and Linux
• Industry-leading performance and ease of use
• ScaleOut GeoServer® adds
• WAN based data replication for DR
• Breakthrough technology for global
data access
• ScaleOut Analytics Server™ adds
• Real-time data analysis for operational data
• Comprehensive management tools
• ScaleOut hServer™ adds
• 1st step for Hadoop real-time analytics
• Accelerates data access and execution.
ScaleOut Software Products
ScaleOut StateServer In-Memory Data Grid
Grid
Service
Grid
Service
Grid
Service
Grid
Service
5 ScaleOut Software, Inc.
In-memory storage for fast updates and retrieval of live data
• Fits in the business logic layer:
• Stores collections of Java/.NET
objects shared by multiple clients.
• Uses create/read/update/delete
and query APIs to access data.
• Implemented across a cluster of
servers or VMs:
• Scales storage and throughput
by adding servers.
• Provides high availability
in case a server fails.
What is an In-Memory Data Grid?
6 ScaleOut Software, Inc.
Scaling Data Access Using an IMDG
Example: Cloud-Hosted App
• Application runs as multiple virtual
servers (VS).
• Application instances store and
retrieve LOB data from cloud-based
file system or database-.
• Applications need fast, scalable
storage for live data.
• In-memory data grid runs as
multiple virtual servers to provide
“elastic” in-memory storage for
live data.
7 ScaleOut Software, Inc.
• As a “vertical” storage tier:
• Runs as middleware software.
• Adds missing storage layer to boost
performance.
• Uses out-of-process memory.
• Avoids repeated trips to a backing store.
Where IMDGs Are Deployed
Processor
Cache
Application
Memory
“In-Process”
L2 Cache
Processor
Cache
Application
Memory
“In-Process”
L2 Cache
Backing
Storage
• As a “horizontal” storage tier:
• Allows data sharing among servers.
• Scales performance & capacity.
• Adds high availability.
• Can be used independently of backing
storage.
In-Memory
Data Grid
“Out-of-
Process”
In-Memory
Data Grid
“Out-of-
Process”
8 ScaleOut Software, Inc.
• IMDG incorporates a client-side in-process
cache (“near cache”):
• Transparent to the application
• Holds recently accessed data
• Boosts performance:
• Eliminates repeated network data transfers &
deserialization
• Reduces access times to near “in-process”
latency
• Is automatically updated if the grid is
updated
• Supports various coherency models
(coherent, polled, event-driven)
The Secret to Fast Access Time
Application
Memory
“In-Process”
Client-side
Cache
“In-Process”
In-Memory
Data Grid
“Out-of-
Process”
9 ScaleOut Software, Inc.
• IMDGs enable seamless data access across on-premise sites and
cloud-based deployments:
• Automatically access
remote data as needed.
• Efficiently manage
WAN bandwidth.
• Enable full data
coherency across sites.
• Supports multiple usage
models:
• Replication for DR
• Remote access
• Synchronized read/write
Global Data Integration
10 ScaleOut Software, Inc.
• IMDG bridges on-premise and cloud-based in-memory storage of
Web session state.
• IMDG automatically migrates session-state objects into the cloud
on demand.
• This enables seamless access to data across multiple sites.
Example: Web Farm Cloud-Bursting
11 ScaleOut Software, Inc.
In-Memory Data Grid is middleware software which provides:
1. Fast access time for fast-changing, “live” data
2. Scalable throughput and storage capacity to match a
growing workload and keep response times low
3. High availability to prevent data loss if a grid server (or
network link) fails
4. Shared access to data
across the server farm
5. Global data access across
multiple sites and the cloud
6. And … fast data analysis
for quickly and easily mining
data using “map/reduce”
Top Benefits of IMDGs
AccessLatency
Throughput
Grid DBMS
Access Latency vs. Throughput
Faster
Scales
12 ScaleOut Software, Inc.
• Traditional “big data” analysis
platforms analyze offline data:
• Example: Hadoop
• Very large, static datasets
• Data is often copied from other
disk-based storage systems to a
distributed file system for analysis.
• IMDGs store and analyze online data:
• Fast-changing, operational data
• Data storage is memory-based.
• Data motion is minimized for fast,
continuous analysis.
IMDGs Analyze Live Data
13 ScaleOut Software, Inc.
A few examples:
• Equity trading: to minimize risk during a trading day
• Ecommerce: to optimize real-time shopping activity
• Reservations systems: to identify issues, reroute, etc.
• Credit cards: to detect fraud in real time
• Smart grids: to optimize power distribution & detect issues
Online Systems Need Real-Time Analysis
14 ScaleOut Software, Inc.
A platform for managing hedging strategies:
• A hedge fund manages a set of hedging strategies:
• Strategies can cover various market
sectors, such as high-tech, automotive,
energy, consumer, real estate, etc.
• Each strategy contains list of holdings
and rules for managing the holdings
(such as target allocations).
• Updates to market data
continuously arrive during
the trading day.
• Challenge: The hedge fund must be able to quickly update and
analyze its hedging strategies and provide alerts to traders.
Example in Financial Services
15 ScaleOut Software, Inc.
• Deliver a stream of alerts to traders
within a few seconds.
• Enable the trader to examine strategy details in real time:
The Result: Real-Time Alerts
16 ScaleOut Software, Inc.
• The IMDG holds the set of strategy objects as an in-memory collection.
• Updates to market data
continuously flow through
the IMDG.
• The IMDG performs
repeated map/reduce
analysis on hedging
strategies every
second.
• Each analysis iteration both updates
and analyzes every strategy object.
• The IMDG collects alerts after each
analysis and delivers them to the
trader.
The Solution: Real-Time Analytics
Using an IMDG
17 ScaleOut Software, Inc.
• Analyze every selected strategy object in parallel within the IMDG:
• Update the strategy’s positions with latest market prices.
• Evaluate the strategy’s rules to see if a trade is needed.
• Example: Alert if current allocation exceeds target threshold.
• Generate an alert if holdings need to be changed.
• Merge the results across all strategy objects to create a set of
alerts.
The Analysis Algorithm
18 ScaleOut Software, Inc.
Shipping Analysis Code to the IMDG
• IMDG creates Java or .NET execution environment for analysis:
• Spans all IMDG servers.
• Ensures tight integration with memory-based data storage.
• IMDG client ships jars/assemblies to IMDG servers for execution:
• Keeps development model simple.
• Optionally allows pre-staging for multiple runs to shorten startup time.
• Optionally allows automatic re-staging if code changes between runs.
• Client starts analysis:
• Sends invocation to
the IMDG.
• IMDG returns
analysis results.
19 ScaleOut Software, Inc.
The parallel analysis executes in three steps:
• Step 1: The application first selects all relevant objects in the
collection with a parallel query run on all grid servers.
• Note: Query spec matches data’s object-oriented properties.
Running the Analysis
20 ScaleOut Software, Inc.
• Step 2: The IMDG automatically schedules analysis operations
across all grid servers and cores.
• The analysis runs on all objects selected
by the parallel query.
• Each grid server analyzes its locally stored
objects to minimize data motion.
• Parallel execution ensures fast
completion time:
• IMDG automatically distributes
workload across servers/cores.
• Scaling the IMDG automatically
handles larger data sets.
Running the Analysis: Step 2
21 ScaleOut Software, Inc.
• File-based map/reduce must move data to memory for analysis:
• IMDG’s memory-based computation engine analyzes data in place:
IMDG Minimizes Data Motion
D D D D D D D D D
D D D D D D D D D
Grid ServerGrid ServerGrid Server
E E E
M/R Server
E
M/R Server
E
M/R Server
E
File System /
Database
Server
Memory
In-Memory
Data Grid
22 ScaleOut Software, Inc.
• Step 3: The IMDG automatically merges all analysis results.
• The IMDG first merges all results within each grid server in parallel.
• It then merges results across all grid servers to create one combined
result.
• Efficient parallel merge
minimizes the delay in
combining all results.
• The IMDG delivers the
combined result to the
trader’s display as one
object.
Running the Analysis: Step 3
23 ScaleOut Software, Inc.
Running a similar analysis algorithm (stock back-testing) within an
IMDG:
• IMDG hosted in Amazon cloud using 75 servers.
• IMDG holds 1 TB of stock history data in memory.
• IMDG handles continuous stream of updates (1.1 GB/s) while
performing real-time analysis on live data.
• Entire data set analyzed in
4.1 seconds (250 GB/s).
• IMDG scales linearly by
adding servers as
workload grows.
Benchmark Results
24 ScaleOut Software, Inc.
• Typically used for very large, static, offline datasets
• Data is held on disk in a file system (HDFS) or DBMS
• Data is often copied from other disk-based storage systems to
HDFS for analysis.
Problem: Hadoop Cannot Efficiently
Perform Real-Time Analytics
25 ScaleOut Software, Inc.
Comparison of IMDGs and Hadoop
IMDG Hadoop
Data set size Gigabytes->terabytes Terabytes->petabytes
Data repository In-memory File / database
Data view Queried object collection File-based key/value
pairs
Development time Low High
Automatic
scalability
Yes Application dependent
Best use Real-time analysis of
live, memory-based data
Batch analysis of
large, static datasets
I/O overhead Low High
Cluster mgt. Simple Complex
High availability Memory-based File-based
26 ScaleOut Software, Inc.
• Survey result from Strata 2013: 93% of Hadoop users would
benefit from real-time data analytics.
• Strategy: Integrate IMDG into Hadoop.
• How:
• Stage data in IMDG for fast access.
• Thereby allow updates to data during
Hadoop execution.
• Automatically retrieve
data from HDFS as
necessary.
• Enable unchanged
Hadoop program
structure.
• Combine scalability
of Hadoop map/reduce
and IMDG.
Enabling Hadoop to Perform
Real-Time Analysis
27 ScaleOut Software, Inc.
• IMDG adds Hadoop grid record
reader for accessing key/value
pairs held in the IMDG.
• Hadoop programs optionally can
output results to IMDG with grid
record writer.
• Applications can access and update
key/value pairs as live data during
analysis.
• Grid record reader and writer
optimize access to key/value pairs
to eliminate network overhead.
Accessing IMDG Data in Hadoop
28 ScaleOut Software, Inc.
• IMDG adds wrapper for HDFS record reader to cache HDFS data
during program execution.
• Hadoop automatically retrieves data from IMDG on subsequent runs.
• Wrapper accesses IMDG to
store and retrieve data
with minimum network
overhead.
• Useful in multiple “what-if”
analyses on one data set
• Tests with Terasort
benchmark have
demonstrated 11X
lower access latency
over HDFS without IMDG.
Using IMDG as an HDFS Cache
29 ScaleOut Software, Inc.
• IMDGs use in-memory storage to scale access to data for
applications which process live, fast-changing data.
• IMDGs can be deployed in the cloud and provide global data
integration across sites.
• Many applications need to
perform real-time analytics
on live data.
• IMDGs can meet this need,
delivering results in seconds
instead of minutes or hours.
• Hadoop was not designed for
real-time analytics, but…
• IMDGs can enable Hadoop to accelerate access to data.
Summary
In-Memory Data Grids for
Server Farms & Cloud Computing
www.scaleoutsoftware.com

Mais conteúdo relacionado

Mais procurados

Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorSquared Up
 
Learning UML with Enterprise Architect
Learning UML with Enterprise ArchitectLearning UML with Enterprise Architect
Learning UML with Enterprise ArchitectGerald R. Gray
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and BeyondMongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and BeyondMongoDB
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...InterSystems
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALinside-BigData.com
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
InterSystems IRIS Data Platfrom: Sharding and Scalability
InterSystems IRIS Data Platfrom: Sharding and ScalabilityInterSystems IRIS Data Platfrom: Sharding and Scalability
InterSystems IRIS Data Platfrom: Sharding and ScalabilityInterSystems
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectPrecisely
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 

Mais procurados (20)

Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure Monitor
 
Learning UML with Enterprise Architect
Learning UML with Enterprise ArchitectLearning UML with Enterprise Architect
Learning UML with Enterprise Architect
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and BeyondMongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
MongoDB Evenings Chicago - Find Your Way in MongoDB 3.2: Compass and Beyond
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...
InterSystems IRIS Data Platform: A Unified Platform for Powering Real-Time, D...
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
InterSystems IRIS Data Platfrom: Sharding and Scalability
InterSystems IRIS Data Platfrom: Sharding and ScalabilityInterSystems IRIS Data Platfrom: Sharding and Scalability
InterSystems IRIS Data Platfrom: Sharding and Scalability
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Real timedata
Real timedataReal timedata
Real timedata
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 

Semelhante a Real-time analysis using an in-memory data grid - Cloud Expo 2013

Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainData Con LA
 
ArchivePod a legacy data solution when migrating to the #CLOUD
ArchivePod a legacy data solution when migrating to the #CLOUDArchivePod a legacy data solution when migrating to the #CLOUD
ArchivePod a legacy data solution when migrating to the #CLOUDGaret Keller
 
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...InfluxData
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Hardware and software requirements for gis
Hardware and software requirements for gisHardware and software requirements for gis
Hardware and software requirements for gisSumant Diwakar
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 

Semelhante a Real-time analysis using an in-memory data grid - Cloud Expo 2013 (20)

Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
 
ArchivePod a legacy data solution when migrating to the #CLOUD
ArchivePod a legacy data solution when migrating to the #CLOUDArchivePod a legacy data solution when migrating to the #CLOUD
ArchivePod a legacy data solution when migrating to the #CLOUD
 
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Hardware and software requirements for gis
Hardware and software requirements for gisHardware and software requirements for gis
Hardware and software requirements for gis
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Real-time analysis using an in-memory data grid - Cloud Expo 2013

  • 1. Performing Real-Time Analytics with In-Memory Data Grids Copyright © 2013 by ScaleOut Software, Inc. Cloud Expo June 10, 2013 Mikhail Sobolev (sobolev@scaleoutsoftware.com) David Brinker (daveb@scaleoutsoftware.com)
  • 2. 2 ScaleOut Software, Inc. • What is an In-Memory Data Grid (IMDG)? • Top Benefits of IMDGs • The Need for Real-Time Analytics • Example: A Platform for Managing Hedging Strategies • Using an IMDG to Perform Real-Time Analysis • Benchmark Results • Integrating an IMDG into Hadoop 2 Agenda
  • 3. 3 ScaleOut Software, Inc. • Dr. Mikhail Sobolev, Lead Java Architect • Ph.D. from Moscow Institute of Physics and Technology • Research and consulting focus in parallel computing • Responsible for development of scalable software services in Java • David Brinker, COO • 20 years software business and executive management experience • Mentor Graphics, Cadence, Webridge • Company: ScaleOut Software • Develops and markets IMDG products • Founded in September 2003 • Offices in Bellevue, WA and Beaverton, OR • Eight years market experience in Windows & Linux About the Speakers
  • 4. 4 ScaleOut Software, Inc. • ScaleOut StateServer® • Flagship product • IMDG middleware for Windows and Linux • Industry-leading performance and ease of use • ScaleOut GeoServer® adds • WAN based data replication for DR • Breakthrough technology for global data access • ScaleOut Analytics Server™ adds • Real-time data analysis for operational data • Comprehensive management tools • ScaleOut hServer™ adds • 1st step for Hadoop real-time analytics • Accelerates data access and execution. ScaleOut Software Products ScaleOut StateServer In-Memory Data Grid Grid Service Grid Service Grid Service Grid Service
  • 5. 5 ScaleOut Software, Inc. In-memory storage for fast updates and retrieval of live data • Fits in the business logic layer: • Stores collections of Java/.NET objects shared by multiple clients. • Uses create/read/update/delete and query APIs to access data. • Implemented across a cluster of servers or VMs: • Scales storage and throughput by adding servers. • Provides high availability in case a server fails. What is an In-Memory Data Grid?
  • 6. 6 ScaleOut Software, Inc. Scaling Data Access Using an IMDG Example: Cloud-Hosted App • Application runs as multiple virtual servers (VS). • Application instances store and retrieve LOB data from cloud-based file system or database-. • Applications need fast, scalable storage for live data. • In-memory data grid runs as multiple virtual servers to provide “elastic” in-memory storage for live data.
  • 7. 7 ScaleOut Software, Inc. • As a “vertical” storage tier: • Runs as middleware software. • Adds missing storage layer to boost performance. • Uses out-of-process memory. • Avoids repeated trips to a backing store. Where IMDGs Are Deployed Processor Cache Application Memory “In-Process” L2 Cache Processor Cache Application Memory “In-Process” L2 Cache Backing Storage • As a “horizontal” storage tier: • Allows data sharing among servers. • Scales performance & capacity. • Adds high availability. • Can be used independently of backing storage. In-Memory Data Grid “Out-of- Process” In-Memory Data Grid “Out-of- Process”
  • 8. 8 ScaleOut Software, Inc. • IMDG incorporates a client-side in-process cache (“near cache”): • Transparent to the application • Holds recently accessed data • Boosts performance: • Eliminates repeated network data transfers & deserialization • Reduces access times to near “in-process” latency • Is automatically updated if the grid is updated • Supports various coherency models (coherent, polled, event-driven) The Secret to Fast Access Time Application Memory “In-Process” Client-side Cache “In-Process” In-Memory Data Grid “Out-of- Process”
  • 9. 9 ScaleOut Software, Inc. • IMDGs enable seamless data access across on-premise sites and cloud-based deployments: • Automatically access remote data as needed. • Efficiently manage WAN bandwidth. • Enable full data coherency across sites. • Supports multiple usage models: • Replication for DR • Remote access • Synchronized read/write Global Data Integration
  • 10. 10 ScaleOut Software, Inc. • IMDG bridges on-premise and cloud-based in-memory storage of Web session state. • IMDG automatically migrates session-state objects into the cloud on demand. • This enables seamless access to data across multiple sites. Example: Web Farm Cloud-Bursting
  • 11. 11 ScaleOut Software, Inc. In-Memory Data Grid is middleware software which provides: 1. Fast access time for fast-changing, “live” data 2. Scalable throughput and storage capacity to match a growing workload and keep response times low 3. High availability to prevent data loss if a grid server (or network link) fails 4. Shared access to data across the server farm 5. Global data access across multiple sites and the cloud 6. And … fast data analysis for quickly and easily mining data using “map/reduce” Top Benefits of IMDGs AccessLatency Throughput Grid DBMS Access Latency vs. Throughput Faster Scales
  • 12. 12 ScaleOut Software, Inc. • Traditional “big data” analysis platforms analyze offline data: • Example: Hadoop • Very large, static datasets • Data is often copied from other disk-based storage systems to a distributed file system for analysis. • IMDGs store and analyze online data: • Fast-changing, operational data • Data storage is memory-based. • Data motion is minimized for fast, continuous analysis. IMDGs Analyze Live Data
  • 13. 13 ScaleOut Software, Inc. A few examples: • Equity trading: to minimize risk during a trading day • Ecommerce: to optimize real-time shopping activity • Reservations systems: to identify issues, reroute, etc. • Credit cards: to detect fraud in real time • Smart grids: to optimize power distribution & detect issues Online Systems Need Real-Time Analysis
  • 14. 14 ScaleOut Software, Inc. A platform for managing hedging strategies: • A hedge fund manages a set of hedging strategies: • Strategies can cover various market sectors, such as high-tech, automotive, energy, consumer, real estate, etc. • Each strategy contains list of holdings and rules for managing the holdings (such as target allocations). • Updates to market data continuously arrive during the trading day. • Challenge: The hedge fund must be able to quickly update and analyze its hedging strategies and provide alerts to traders. Example in Financial Services
  • 15. 15 ScaleOut Software, Inc. • Deliver a stream of alerts to traders within a few seconds. • Enable the trader to examine strategy details in real time: The Result: Real-Time Alerts
  • 16. 16 ScaleOut Software, Inc. • The IMDG holds the set of strategy objects as an in-memory collection. • Updates to market data continuously flow through the IMDG. • The IMDG performs repeated map/reduce analysis on hedging strategies every second. • Each analysis iteration both updates and analyzes every strategy object. • The IMDG collects alerts after each analysis and delivers them to the trader. The Solution: Real-Time Analytics Using an IMDG
  • 17. 17 ScaleOut Software, Inc. • Analyze every selected strategy object in parallel within the IMDG: • Update the strategy’s positions with latest market prices. • Evaluate the strategy’s rules to see if a trade is needed. • Example: Alert if current allocation exceeds target threshold. • Generate an alert if holdings need to be changed. • Merge the results across all strategy objects to create a set of alerts. The Analysis Algorithm
  • 18. 18 ScaleOut Software, Inc. Shipping Analysis Code to the IMDG • IMDG creates Java or .NET execution environment for analysis: • Spans all IMDG servers. • Ensures tight integration with memory-based data storage. • IMDG client ships jars/assemblies to IMDG servers for execution: • Keeps development model simple. • Optionally allows pre-staging for multiple runs to shorten startup time. • Optionally allows automatic re-staging if code changes between runs. • Client starts analysis: • Sends invocation to the IMDG. • IMDG returns analysis results.
  • 19. 19 ScaleOut Software, Inc. The parallel analysis executes in three steps: • Step 1: The application first selects all relevant objects in the collection with a parallel query run on all grid servers. • Note: Query spec matches data’s object-oriented properties. Running the Analysis
  • 20. 20 ScaleOut Software, Inc. • Step 2: The IMDG automatically schedules analysis operations across all grid servers and cores. • The analysis runs on all objects selected by the parallel query. • Each grid server analyzes its locally stored objects to minimize data motion. • Parallel execution ensures fast completion time: • IMDG automatically distributes workload across servers/cores. • Scaling the IMDG automatically handles larger data sets. Running the Analysis: Step 2
  • 21. 21 ScaleOut Software, Inc. • File-based map/reduce must move data to memory for analysis: • IMDG’s memory-based computation engine analyzes data in place: IMDG Minimizes Data Motion D D D D D D D D D D D D D D D D D D Grid ServerGrid ServerGrid Server E E E M/R Server E M/R Server E M/R Server E File System / Database Server Memory In-Memory Data Grid
  • 22. 22 ScaleOut Software, Inc. • Step 3: The IMDG automatically merges all analysis results. • The IMDG first merges all results within each grid server in parallel. • It then merges results across all grid servers to create one combined result. • Efficient parallel merge minimizes the delay in combining all results. • The IMDG delivers the combined result to the trader’s display as one object. Running the Analysis: Step 3
  • 23. 23 ScaleOut Software, Inc. Running a similar analysis algorithm (stock back-testing) within an IMDG: • IMDG hosted in Amazon cloud using 75 servers. • IMDG holds 1 TB of stock history data in memory. • IMDG handles continuous stream of updates (1.1 GB/s) while performing real-time analysis on live data. • Entire data set analyzed in 4.1 seconds (250 GB/s). • IMDG scales linearly by adding servers as workload grows. Benchmark Results
  • 24. 24 ScaleOut Software, Inc. • Typically used for very large, static, offline datasets • Data is held on disk in a file system (HDFS) or DBMS • Data is often copied from other disk-based storage systems to HDFS for analysis. Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics
  • 25. 25 ScaleOut Software, Inc. Comparison of IMDGs and Hadoop IMDG Hadoop Data set size Gigabytes->terabytes Terabytes->petabytes Data repository In-memory File / database Data view Queried object collection File-based key/value pairs Development time Low High Automatic scalability Yes Application dependent Best use Real-time analysis of live, memory-based data Batch analysis of large, static datasets I/O overhead Low High Cluster mgt. Simple Complex High availability Memory-based File-based
  • 26. 26 ScaleOut Software, Inc. • Survey result from Strata 2013: 93% of Hadoop users would benefit from real-time data analytics. • Strategy: Integrate IMDG into Hadoop. • How: • Stage data in IMDG for fast access. • Thereby allow updates to data during Hadoop execution. • Automatically retrieve data from HDFS as necessary. • Enable unchanged Hadoop program structure. • Combine scalability of Hadoop map/reduce and IMDG. Enabling Hadoop to Perform Real-Time Analysis
  • 27. 27 ScaleOut Software, Inc. • IMDG adds Hadoop grid record reader for accessing key/value pairs held in the IMDG. • Hadoop programs optionally can output results to IMDG with grid record writer. • Applications can access and update key/value pairs as live data during analysis. • Grid record reader and writer optimize access to key/value pairs to eliminate network overhead. Accessing IMDG Data in Hadoop
  • 28. 28 ScaleOut Software, Inc. • IMDG adds wrapper for HDFS record reader to cache HDFS data during program execution. • Hadoop automatically retrieves data from IMDG on subsequent runs. • Wrapper accesses IMDG to store and retrieve data with minimum network overhead. • Useful in multiple “what-if” analyses on one data set • Tests with Terasort benchmark have demonstrated 11X lower access latency over HDFS without IMDG. Using IMDG as an HDFS Cache
  • 29. 29 ScaleOut Software, Inc. • IMDGs use in-memory storage to scale access to data for applications which process live, fast-changing data. • IMDGs can be deployed in the cloud and provide global data integration across sites. • Many applications need to perform real-time analytics on live data. • IMDGs can meet this need, delivering results in seconds instead of minutes or hours. • Hadoop was not designed for real-time analytics, but… • IMDGs can enable Hadoop to accelerate access to data. Summary
  • 30. In-Memory Data Grids for Server Farms & Cloud Computing www.scaleoutsoftware.com