SlideShare uma empresa Scribd logo
1 de 32
Enabling Real-Time Analytics
Using Hadoop Map/Reduce
Hadoop Users Group
November 20, 2013

Bill Bain, CEO (wbain@scaleoutsoftware.com)

Copyright © 2013 by ScaleOut Software, Inc.
Agenda
• Quick Review of In-Memory Data Grids
• The Need for Real-Time Analytics: Two Use Cases
• Data-Parallel Computation on an IMDG Using Parallel Method
Invocation (PMI)
• Implementing MapReduce Using PMI: ScaleOut hServer™
• Sample Use Cases
• Video Demo

• Comparison to Spark

2

ScaleOut Software, Inc.
About ScaleOut Software
• Develops and markets In-Memory Data Grids:
software middleware for:
• Scaling application performance and
• Performing real-time analytics using
• In-memory data storage and computing

• Dr. William Bain, Founder & CEO
• Career focused on parallel computing – Bell Labs, Intel, Microsoft
• 3 prior start-ups, last acquired by Microsoft and product now ships as
Network Load Balancing in Windows Server

• Eight years in the market; 400 customers, 9,000 servers
• Sample customers:

3

ScaleOut Software, Inc.
What is an In-Memory Data Grid?
In-memory storage for fast updates and retrieval of live data
• Fits in the business logic layer:
• Follows object-oriented view of data
(vs. relational view).
• Stores collections of Java/.NET
objects shared by multiple clients.
• Uses create/read/update/delete
and query APIs to access data.

• Implemented across a cluster of
servers or VMs:
• Scales storage and throughput
by adding servers.

• Provides high availability
in case a server fails.
4

ScaleOut Software, Inc.
Our Focus: Real-Time Analytics
Real-time

Batch

Live data sets
Gigabytes to terabytes
In-memory storage
Minutes to seconds
Best uses:

Static data sets
Petabytes
Disk storage
Hours to minutes
Best uses:

“Business Intelligence”

“Operational Intelligence”

• Tracking live data
• Immediately
identifying trends
and capturing
opportunities

5

Big Data Analytics
Real-Time

Batch

Analytics
Server

Hadoop
IBM
Teradata
SAS
SAP

hServer

ScaleOut Software, Inc.

• Analyzing
warehoused data
• Mining for longterm trends
Online Systems Need Real-Time Analysis
A
•
•
•
•
•

6

few examples:
Equity trading: to minimize risk during a trading day
Ecommerce: to optimize real-time shopping activity
Reservations systems: to identify issues, reroute, etc.
Credit cards: to detect fraud in real time
Smart grids: to optimize power distribution & detect issues

ScaleOut Software, Inc.
Integrate MapReduce
into IMDG for Real-Time Analytics
Benefits:
• Enables use of widely used Hadoop MapReduce APIs:
• Accelerates data access by staging data in memory.
• Eliminates batch scheduling
and data shuffling overheads
of standard Hadoop distributions.
• Analyzes and updates live data.

• Enables Hadoop
deployment in live
systems.
• Hadoop MapReduce
programs run without change.
• ScaleOut’s implementation is called
ScaleOut hServer™.
7

ScaleOut Software, Inc.
Data-Parallel Analysis Is Not New
• 1980’s: Special Purpose Hardware: “SIMD”

Thinking Machines
Connection Machine 5

• 1990’s: General Purpose Parallel Supercomputers:
“Domain Decomposition”, “SPMD”
Intel
IPSC-2

8

ScaleOut Software, Inc.

IBM
SP1
Data-Parallel Analysis Is Not New
• 1990’s – early 2000’s: HPC on Clusters: “MPI”

HP
Blade
Servers

• Since 2003: Clusters, the Cloud, and IMDGs: “MapReduce”

Amazon EC2,
Windows Azure

9

ScaleOut Software, Inc.
Parallel Method Invocation
• Basic, well understood model of data-parallel computation
• Implemented for use on objects hosted in IMDGs:
• Executes user’s code in parallel across the grid.
• Uses parallel query to select objects for analysis.

Analyze Data (Eval)
In-Memory Data Grid Runs
Data-Parallel Analysis.

Combine Results
(Merge)

10

ScaleOut Software, Inc.
Running the Analysis
The parallel analysis executes in three steps:
• Step 1: The application first selects all relevant objects in the
collection with a parallel query run on all grid servers.
• Note: Query spec matches data’s object-oriented properties.

11

ScaleOut Software, Inc.
Running the Analysis: Step 2
• Step 2: The IMDG automatically schedules analysis operations
across all grid servers and cores.
• The analysis runs on all objects selected
by the parallel query.
• Each grid server analyzes its locally stored
objects to minimize data motion.

• Parallel execution ensures fast
completion time:
• IMDG automatically distributes
workload across servers/cores.
• Scaling the IMDG automatically
handles larger data sets.

12

ScaleOut Software, Inc.
Running the Analysis: Step 3
• Step 3: The IMDG automatically merges all analysis results.
• The IMDG first merges all results within each grid server in parallel.
• It then merges results across all grid servers to create one combined
result.

• Efficient parallel merge
minimizes the delay in
combining all results.
• The IMDG delivers the
combined result to the
trader’s display as one
object.

13

ScaleOut Software, Inc.
Sample Performance Results for PMI
Optimizing a stock trading platform with real-time analysis:
• IMDG hosted in Amazon
cloud using 75 servers.
• IMDG holds 1 TB of stock
history data in memory.
• IMDG handles continuous
stream of updates (1.1 GB/s).
• IMDG performs real-time
analysis on live data.
• Entire data set analyzed in
4.1 seconds (250 GB/s).
• IMDG scales linearly as
workload grows.
14

ScaleOut Software, Inc.
Implementing Real-Time MapReduce
• Goal: Run MapReduce applications from a remote workstation.
• The IMDG automatically builds an “invocation grid” of JVMs on the
grid’s servers for PMI and ships the application’s jars.
• The invocation grid can be reused to shorten startup time.

• Use PMI to implement MapReduce.

15

ScaleOut Software, Inc.
Accelerating MapReduce Execution
PMI is the foundation of fast
execution time:
• Data can be input from either the
IMDG or an external data source.
•

Works with any input/output format
compatible with the Apache
distribution.

• ScaleOut IMDG uses its dataparallel execution engine (PMI) to
invoke the mappers and the
reducers.
•

Eliminates batch scheduling
overhead.

• Intermediate results are stored
within the IMDG.
•
•
16

Minimizes data motion between the
mappers and reducers.
Allows optional sorting.
ScaleOut Software, Inc.
Only One-Line Code Change
ScaleOut hServer subclasses the Hadoop Job class:
// This job will run using the Hadoop
// job tracker:
public static void main(String[] args)
throws
Exception {

// This job will run using ScaleOut hServer:

Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");

Configuration conf = new Configuration();
Job job = new HServerJob(conf, "wordcount");

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(
TextInputFormat.class);
job.setOutputFormatClass(
TextOutputFormat.class);
FileInputFormat.addInputPath(
job, new Path(args[0]));
FileOutputFormat.setOutputPath(
job, new Path(args[1]));

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(
TextInputFormat.class);
job.setOutputFormatClass(
TextOutputFormat.class);
FileInputFormat.addInputPath(
job, new Path(args[0]));
FileOutputFormat.setOutputPath(
job, new Path(args[1]));

job.waitForCompletion(true);
}

job.waitForCompletion(true);
}

17

public static void main(String[] args)
throws Exception {

ScaleOut Software, Inc.
Accessing IMDG Data for M/R
• IMDG adds grid input format for
accessing key/value pairs held in
the IMDG.
• MapReduce programs optionally
can output results to IMDG with
grid output format.
• Grid Record Reader optimizes
access to key/value pairs to
eliminate network overhead.
• Applications can access and
update key/value pairs as
operational data during analysis.

18

ScaleOut Software, Inc.
Optimized In-Memory Storage
Multiple in-memory storage
models:
• Named cache, optimized
for rich semantics:
• Property-based query

• Distributed locking
• Access from remote grids

• Named map, optimized for
efficient storage and bulk
analysis:
• Highly efficient object
storage
• Pipelined, bulk-access
mechanisms

19

ScaleOut Software, Inc.
Example: Ecommerce: Inventory Management
Fast map/reduce reconciles inventory and order systems
for an online retailer:
• Challenge: Inventory and online
order management are handled
by different applications.
• Reconciled once per day.
• Inaccurate orders reduces margins.

• Solution:
• Host SKUs in IMDG updated in real
time by order & inventory systems.
• Use PMI to reconcile in two minutes.

• Results: Real-time reconciliation ensures accurate orders.
20

ScaleOut Software, Inc.
Example in Financial Services
Integrate analysis into a stock trading platform:
• The IMDG holds market data and hedging strategies.
• Updates to market data
continuously flow through
the IMDG.
• The IMDG performs
repeated map/reduce
analysis on hedging
strategies and alerts
traders in real time.
• IMDG automatically and dynamically
scales its throughput to handle new
hedging strategies by adding servers.
21

ScaleOut Software, Inc.
Demo
• Video Link

22

ScaleOut Software, Inc.
Comparison to Spark
• Spark is intended to accelerate data analysis using in-memory
computing.
• ScaleOut’s IMDG provides standard MapReduce for “live” systems.
Spark

ScaleOut IMDG

New MapReduce engine

Yes

Yes

In-memory data storage

Resilient Distr. Datasets

Distributed Objects

Load/store from HDFS

Yes

Yes

Avoid disk access

Yes

Yes

CRUD on live data

No

Yes

Query on properties

No

Yes

High availability

Rebuild on failure

Replication and failover

Extensibility

Additional operators

PMI methods

Open source

Yes

Hybrid

23

ScaleOut Software, Inc.
Summary
• Online systems need to analyze “live” data in real-time.

• MapReduce has traditionally focused on analyzing
large, static (offline) datasets held in file systems.
• An in-memory data grid (IMDG) can accelerate
MapReduce applications, enabling real-time analytics:
• Enables the application to analyze and update live data.

• Leverages the IMDG’s load-balanced placement of data.
• Avoids batch-scheduled startup delays.
• Avoids data motion from secondary storage.

• MapReduce can be implemented using standard dataparallel computing techniques (“parallel method
invocation”):
• Tightly integrates Map/Reduce engine with the IMDG.
• Accelerates Map/Reduce execution by >20X in benchmark
tests.
24

ScaleOut Software, Inc.
Accelerating Start-Up Times
• The invocation grid can be re-used across MapReduce jobs:
public static void main(String argv[]) throws Exception {
//Configure and load the invocation grid
InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid").
// Add JAR files as IG dependencies
addJar("main-job.jar"). addJar("first-library.jar").
// Add classes as IG dependencies
addClass(MyMapper.class). addClass(MyReducer.class).
// Define custom JVM parameters
setJVMParameters("-Xms512M -Xmx1024M").
load();
//Run 10 jobs on the same invocation grid
for(int i=0; i<10; i++) {
Configuration conf = new Configuration();
//The preloaded invocation grid is passed as the parameter to the job
Job job = new HServerJob(conf, "Job number "+i, false, grid);
//......Configure the job here.........
//Run the job
job.waitForCompletion(true);
}
//Unload the invocation grid when we are done
grid.unload();
}

25

ScaleOut Software, Inc.
Targeted Use Cases
Run continuous Hadoop
on live data, while it’s
being updated.

Accelerate Hadoop on
static data with a one
line code change.

Quickly prototype
Hadoop code.
26

“Capture perishable business
opportunities and identify issues.”
Real-time risk
analysis

Credit card fraud
detection

...

“Speed-up Hadoop execution by >10X for
faster business insights.”
Financial
modeling

Process
simulations

...

“Validate your Hadoop code before it
goes into batch processing.”
No need to install
Hadoop stack
ScaleOut Software, Inc.

Fast-turn debug
and tuning

...
The Need for Real-Time Analytics
Many Use Cases:
•

Across Key Industries:

Authorizations / Payment
Processing / Mobile Payments

•
•
•

•
•
•
•

•
•
•

27

ScaleOut Software, Inc.

Health Care

•

Operational Risk Compliance

Government

•

Financial: Risk, P&L, Pricing

Life Sciences

•

Execution Rules

IC / DoD

•

Market Feed / Event Handlers

Logistics

•

Churn Management

Manufacturing

•

Situational Awareness

Utilities

•

Fraud Detection

Retail

•

Real Time Tracking

Telco

•

Sensor Data / SCADA

Financial

•

Inventory Management

CPG

•

Service Activation

•

•

Law enforcement
Problem: Hadoop Cannot Efficiently
Perform Real-Time Analytics
• Typically used for very large, static, offline datasets
• Data must be copied from disk-based storage (e.g., HDFS) into
memory for analysis.
• Hadoop Map/Reduce adds lengthy batch scheduling and data
shuffling overhead.

28

ScaleOut Software, Inc.
Hadoop Users Need
Real-Time Analytics
• ScaleOut Software conducted informal survey at Strata 2013
Conference (Santa Clara).
• Based on 150 responses:
• 78% of organizations generate fast-changing data.
• 60% use Hadoop and 78% plan to expand usage of Hadoop within
12 months.
• Only 42% consider Hadoop to be an effective platform for realtime analysis, but…
• 93% would benefit from real-time data analytics.
• 71% consider a 10X improvement in performance meaningful.

• Take-away: Hadoop users need real-time analytics.
29

ScaleOut Software, Inc.
Optional Caching of HDFS Data
• ScaleOut hServer adds Dataset Record Reader (wrapper) to
cache HDFS data during program execution.
• Hadoop automatically retrieves data from ScaleOut IMDG on
subsequent runs.
• Dataset Record Reader
stores and retrieves data
with minimum network
and memory overheads.
• Tests with Terasort
benchmark have
demonstrated 11X
faster access latency
over HDFS without IMDG.
30

ScaleOut Software, Inc.
Java Example: Parallel Method Invocation
• Create method to analyze each queried stock object and another
method to pair-wise merge the results:
public class StockAnalysis implements
Invokable<Stock, StockCalcParams, Double>
{
public Double eval(Stock stock, StockCalcParams param)
throws InvokeException {
return stock.getPrice() * stock.getTotalShares();
}
public Double merge(Double first, Double second)
throws InvokeException {
return first + second;
}

}

31

ScaleOut Software, Inc.
Java Example: Parallel Method Invocation
•

Run a parallel method invocation on the query results:

NamedCache cache = CacheFactory.getCache("Stocks");
InvokeResult valueOfSelectedStocks =
cache.invoke(
StockAnalysis.class,
Stock.class,
or(equal("ticker", "GOOG"), equal("ticker", "ORCL")),
new StockCalcParams());
System.out.println("The value of selected stocks is" +
valueOfSelectedStocks.getResult());

32

ScaleOut Software, Inc.

Mais conteúdo relacionado

Mais procurados

Presentation capacity management for oracle exadata database machine v2
Presentation   capacity management for oracle exadata database machine v2Presentation   capacity management for oracle exadata database machine v2
Presentation capacity management for oracle exadata database machine v2xKinAnx
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systemsRakuten Group, Inc.
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainData Con LA
 
What’s New in Assure MIMIX 10
What’s New in Assure MIMIX 10What’s New in Assure MIMIX 10
What’s New in Assure MIMIX 10Precisely
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Caching Data in OutSystems: A Tale of Gains Without Pain
Caching Data in OutSystems: A Tale of Gains Without PainCaching Data in OutSystems: A Tale of Gains Without Pain
Caching Data in OutSystems: A Tale of Gains Without PainCatarinaPereira64715
 
20 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 201820 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 2018Susheel Patidar
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALinside-BigData.com
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessAli Hodroj
 
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageZero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageDataCore Software
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeAli Hodroj
 

Mais procurados (20)

Presentation capacity management for oracle exadata database machine v2
Presentation   capacity management for oracle exadata database machine v2Presentation   capacity management for oracle exadata database machine v2
Presentation capacity management for oracle exadata database machine v2
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systems
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGainIn memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
 
What’s New in Assure MIMIX 10
What’s New in Assure MIMIX 10What’s New in Assure MIMIX 10
What’s New in Assure MIMIX 10
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Caching Data in OutSystems: A Tale of Gains Without Pain
Caching Data in OutSystems: A Tale of Gains Without PainCaching Data in OutSystems: A Tale of Gains Without Pain
Caching Data in OutSystems: A Tale of Gains Without Pain
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
20 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 201820 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 2018
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageZero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 

Semelhante a November 2013 HUG: Real-time analytics with in-memory grid

Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Gaurav Bhardwaj
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...In-Memory Computing Summit
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataDevOps for Enterprise Systems
 
Predicting Machine Failure App
Predicting Machine Failure AppPredicting Machine Failure App
Predicting Machine Failure AppAbhinav Bisht
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Nagios
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...MapR Technologies
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in HadoopPrecisely
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...Grid Dynamics
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011Shay Hassidim
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayMIDIH_EU
 
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderServerless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderDaniel Krook
 
Gpu digital lab english version
Gpu digital lab english versionGpu digital lab english version
Gpu digital lab english versionoleg gubanov
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformJampp
 

Semelhante a November 2013 HUG: Real-time analytics with in-memory grid (20)

Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
 
Predicting Machine Failure App
Predicting Machine Failure AppPredicting Machine Failure App
Predicting Machine Failure App
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_day
 
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderServerless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
 
Gpu digital lab english version
Gpu digital lab english versionGpu digital lab english version
Gpu digital lab english version
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 

Mais de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mais de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

November 2013 HUG: Real-time analytics with in-memory grid

  • 1. Enabling Real-Time Analytics Using Hadoop Map/Reduce Hadoop Users Group November 20, 2013 Bill Bain, CEO (wbain@scaleoutsoftware.com) Copyright © 2013 by ScaleOut Software, Inc.
  • 2. Agenda • Quick Review of In-Memory Data Grids • The Need for Real-Time Analytics: Two Use Cases • Data-Parallel Computation on an IMDG Using Parallel Method Invocation (PMI) • Implementing MapReduce Using PMI: ScaleOut hServer™ • Sample Use Cases • Video Demo • Comparison to Spark 2 ScaleOut Software, Inc.
  • 3. About ScaleOut Software • Develops and markets In-Memory Data Grids: software middleware for: • Scaling application performance and • Performing real-time analytics using • In-memory data storage and computing • Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server • Eight years in the market; 400 customers, 9,000 servers • Sample customers: 3 ScaleOut Software, Inc.
  • 4. What is an In-Memory Data Grid? In-memory storage for fast updates and retrieval of live data • Fits in the business logic layer: • Follows object-oriented view of data (vs. relational view). • Stores collections of Java/.NET objects shared by multiple clients. • Uses create/read/update/delete and query APIs to access data. • Implemented across a cluster of servers or VMs: • Scales storage and throughput by adding servers. • Provides high availability in case a server fails. 4 ScaleOut Software, Inc.
  • 5. Our Focus: Real-Time Analytics Real-time Batch Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses: Static data sets Petabytes Disk storage Hours to minutes Best uses: “Business Intelligence” “Operational Intelligence” • Tracking live data • Immediately identifying trends and capturing opportunities 5 Big Data Analytics Real-Time Batch Analytics Server Hadoop IBM Teradata SAS SAP hServer ScaleOut Software, Inc. • Analyzing warehoused data • Mining for longterm trends
  • 6. Online Systems Need Real-Time Analysis A • • • • • 6 few examples: Equity trading: to minimize risk during a trading day Ecommerce: to optimize real-time shopping activity Reservations systems: to identify issues, reroute, etc. Credit cards: to detect fraud in real time Smart grids: to optimize power distribution & detect issues ScaleOut Software, Inc.
  • 7. Integrate MapReduce into IMDG for Real-Time Analytics Benefits: • Enables use of widely used Hadoop MapReduce APIs: • Accelerates data access by staging data in memory. • Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions. • Analyzes and updates live data. • Enables Hadoop deployment in live systems. • Hadoop MapReduce programs run without change. • ScaleOut’s implementation is called ScaleOut hServer™. 7 ScaleOut Software, Inc.
  • 8. Data-Parallel Analysis Is Not New • 1980’s: Special Purpose Hardware: “SIMD” Thinking Machines Connection Machine 5 • 1990’s: General Purpose Parallel Supercomputers: “Domain Decomposition”, “SPMD” Intel IPSC-2 8 ScaleOut Software, Inc. IBM SP1
  • 9. Data-Parallel Analysis Is Not New • 1990’s – early 2000’s: HPC on Clusters: “MPI” HP Blade Servers • Since 2003: Clusters, the Cloud, and IMDGs: “MapReduce” Amazon EC2, Windows Azure 9 ScaleOut Software, Inc.
  • 10. Parallel Method Invocation • Basic, well understood model of data-parallel computation • Implemented for use on objects hosted in IMDGs: • Executes user’s code in parallel across the grid. • Uses parallel query to select objects for analysis. Analyze Data (Eval) In-Memory Data Grid Runs Data-Parallel Analysis. Combine Results (Merge) 10 ScaleOut Software, Inc.
  • 11. Running the Analysis The parallel analysis executes in three steps: • Step 1: The application first selects all relevant objects in the collection with a parallel query run on all grid servers. • Note: Query spec matches data’s object-oriented properties. 11 ScaleOut Software, Inc.
  • 12. Running the Analysis: Step 2 • Step 2: The IMDG automatically schedules analysis operations across all grid servers and cores. • The analysis runs on all objects selected by the parallel query. • Each grid server analyzes its locally stored objects to minimize data motion. • Parallel execution ensures fast completion time: • IMDG automatically distributes workload across servers/cores. • Scaling the IMDG automatically handles larger data sets. 12 ScaleOut Software, Inc.
  • 13. Running the Analysis: Step 3 • Step 3: The IMDG automatically merges all analysis results. • The IMDG first merges all results within each grid server in parallel. • It then merges results across all grid servers to create one combined result. • Efficient parallel merge minimizes the delay in combining all results. • The IMDG delivers the combined result to the trader’s display as one object. 13 ScaleOut Software, Inc.
  • 14. Sample Performance Results for PMI Optimizing a stock trading platform with real-time analysis: • IMDG hosted in Amazon cloud using 75 servers. • IMDG holds 1 TB of stock history data in memory. • IMDG handles continuous stream of updates (1.1 GB/s). • IMDG performs real-time analysis on live data. • Entire data set analyzed in 4.1 seconds (250 GB/s). • IMDG scales linearly as workload grows. 14 ScaleOut Software, Inc.
  • 15. Implementing Real-Time MapReduce • Goal: Run MapReduce applications from a remote workstation. • The IMDG automatically builds an “invocation grid” of JVMs on the grid’s servers for PMI and ships the application’s jars. • The invocation grid can be reused to shorten startup time. • Use PMI to implement MapReduce. 15 ScaleOut Software, Inc.
  • 16. Accelerating MapReduce Execution PMI is the foundation of fast execution time: • Data can be input from either the IMDG or an external data source. • Works with any input/output format compatible with the Apache distribution. • ScaleOut IMDG uses its dataparallel execution engine (PMI) to invoke the mappers and the reducers. • Eliminates batch scheduling overhead. • Intermediate results are stored within the IMDG. • • 16 Minimizes data motion between the mappers and reducers. Allows optional sorting. ScaleOut Software, Inc.
  • 17. Only One-Line Code Change ScaleOut hServer subclasses the Hadoop Job class: // This job will run using the Hadoop // job tracker: public static void main(String[] args) throws Exception { // This job will run using ScaleOut hServer: Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); Configuration conf = new Configuration(); Job job = new HServerJob(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass( TextInputFormat.class); job.setOutputFormatClass( TextOutputFormat.class); FileInputFormat.addInputPath( job, new Path(args[0])); FileOutputFormat.setOutputPath( job, new Path(args[1])); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass( TextInputFormat.class); job.setOutputFormatClass( TextOutputFormat.class); FileInputFormat.addInputPath( job, new Path(args[0])); FileOutputFormat.setOutputPath( job, new Path(args[1])); job.waitForCompletion(true); } job.waitForCompletion(true); } 17 public static void main(String[] args) throws Exception { ScaleOut Software, Inc.
  • 18. Accessing IMDG Data for M/R • IMDG adds grid input format for accessing key/value pairs held in the IMDG. • MapReduce programs optionally can output results to IMDG with grid output format. • Grid Record Reader optimizes access to key/value pairs to eliminate network overhead. • Applications can access and update key/value pairs as operational data during analysis. 18 ScaleOut Software, Inc.
  • 19. Optimized In-Memory Storage Multiple in-memory storage models: • Named cache, optimized for rich semantics: • Property-based query • Distributed locking • Access from remote grids • Named map, optimized for efficient storage and bulk analysis: • Highly efficient object storage • Pipelined, bulk-access mechanisms 19 ScaleOut Software, Inc.
  • 20. Example: Ecommerce: Inventory Management Fast map/reduce reconciles inventory and order systems for an online retailer: • Challenge: Inventory and online order management are handled by different applications. • Reconciled once per day. • Inaccurate orders reduces margins. • Solution: • Host SKUs in IMDG updated in real time by order & inventory systems. • Use PMI to reconcile in two minutes. • Results: Real-time reconciliation ensures accurate orders. 20 ScaleOut Software, Inc.
  • 21. Example in Financial Services Integrate analysis into a stock trading platform: • The IMDG holds market data and hedging strategies. • Updates to market data continuously flow through the IMDG. • The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time. • IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers. 21 ScaleOut Software, Inc.
  • 23. Comparison to Spark • Spark is intended to accelerate data analysis using in-memory computing. • ScaleOut’s IMDG provides standard MapReduce for “live” systems. Spark ScaleOut IMDG New MapReduce engine Yes Yes In-memory data storage Resilient Distr. Datasets Distributed Objects Load/store from HDFS Yes Yes Avoid disk access Yes Yes CRUD on live data No Yes Query on properties No Yes High availability Rebuild on failure Replication and failover Extensibility Additional operators PMI methods Open source Yes Hybrid 23 ScaleOut Software, Inc.
  • 24. Summary • Online systems need to analyze “live” data in real-time. • MapReduce has traditionally focused on analyzing large, static (offline) datasets held in file systems. • An in-memory data grid (IMDG) can accelerate MapReduce applications, enabling real-time analytics: • Enables the application to analyze and update live data. • Leverages the IMDG’s load-balanced placement of data. • Avoids batch-scheduled startup delays. • Avoids data motion from secondary storage. • MapReduce can be implemented using standard dataparallel computing techniques (“parallel method invocation”): • Tightly integrates Map/Reduce engine with the IMDG. • Accelerates Map/Reduce execution by >20X in benchmark tests. 24 ScaleOut Software, Inc.
  • 25. Accelerating Start-Up Times • The invocation grid can be re-used across MapReduce jobs: public static void main(String argv[]) throws Exception { //Configure and load the invocation grid InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid"). // Add JAR files as IG dependencies addJar("main-job.jar"). addJar("first-library.jar"). // Add classes as IG dependencies addClass(MyMapper.class). addClass(MyReducer.class). // Define custom JVM parameters setJVMParameters("-Xms512M -Xmx1024M"). load(); //Run 10 jobs on the same invocation grid for(int i=0; i<10; i++) { Configuration conf = new Configuration(); //The preloaded invocation grid is passed as the parameter to the job Job job = new HServerJob(conf, "Job number "+i, false, grid); //......Configure the job here......... //Run the job job.waitForCompletion(true); } //Unload the invocation grid when we are done grid.unload(); } 25 ScaleOut Software, Inc.
  • 26. Targeted Use Cases Run continuous Hadoop on live data, while it’s being updated. Accelerate Hadoop on static data with a one line code change. Quickly prototype Hadoop code. 26 “Capture perishable business opportunities and identify issues.” Real-time risk analysis Credit card fraud detection ... “Speed-up Hadoop execution by >10X for faster business insights.” Financial modeling Process simulations ... “Validate your Hadoop code before it goes into batch processing.” No need to install Hadoop stack ScaleOut Software, Inc. Fast-turn debug and tuning ...
  • 27. The Need for Real-Time Analytics Many Use Cases: • Across Key Industries: Authorizations / Payment Processing / Mobile Payments • • • • • • • • • • 27 ScaleOut Software, Inc. Health Care • Operational Risk Compliance Government • Financial: Risk, P&L, Pricing Life Sciences • Execution Rules IC / DoD • Market Feed / Event Handlers Logistics • Churn Management Manufacturing • Situational Awareness Utilities • Fraud Detection Retail • Real Time Tracking Telco • Sensor Data / SCADA Financial • Inventory Management CPG • Service Activation • • Law enforcement
  • 28. Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics • Typically used for very large, static, offline datasets • Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis. • Hadoop Map/Reduce adds lengthy batch scheduling and data shuffling overhead. 28 ScaleOut Software, Inc.
  • 29. Hadoop Users Need Real-Time Analytics • ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara). • Based on 150 responses: • 78% of organizations generate fast-changing data. • 60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months. • Only 42% consider Hadoop to be an effective platform for realtime analysis, but… • 93% would benefit from real-time data analytics. • 71% consider a 10X improvement in performance meaningful. • Take-away: Hadoop users need real-time analytics. 29 ScaleOut Software, Inc.
  • 30. Optional Caching of HDFS Data • ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution. • Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs. • Dataset Record Reader stores and retrieves data with minimum network and memory overheads. • Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG. 30 ScaleOut Software, Inc.
  • 31. Java Example: Parallel Method Invocation • Create method to analyze each queried stock object and another method to pair-wise merge the results: public class StockAnalysis implements Invokable<Stock, StockCalcParams, Double> { public Double eval(Stock stock, StockCalcParams param) throws InvokeException { return stock.getPrice() * stock.getTotalShares(); } public Double merge(Double first, Double second) throws InvokeException { return first + second; } } 31 ScaleOut Software, Inc.
  • 32. Java Example: Parallel Method Invocation • Run a parallel method invocation on the query results: NamedCache cache = CacheFactory.getCache("Stocks"); InvokeResult valueOfSelectedStocks = cache.invoke( StockAnalysis.class, Stock.class, or(equal("ticker", "GOOG"), equal("ticker", "ORCL")), new StockCalcParams()); System.out.println("The value of selected stocks is" + valueOfSelectedStocks.getResult()); 32 ScaleOut Software, Inc.