SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
Distributed Heterogeneous
Mixture Learning on Spark
Masato Asahara and Ryohei Fujimaki
NEC Data Science Research Labs.
Jun/08/2016 @Spark Summit 2016
2 © NEC Corporation 2016
Who we are?
▌Masato Asahara (Ph.D.)
▌Researcher, NEC Data Science Research Laboratory
 Masato received his Ph.D. degree from Keio University, and has worked at
NEC for 6 years in the research field of distributed computing systems and
computing resource management technologies.
 Masato is currently leading developments of Spark-based machine learning
and data analytics systems, particularly using NEC’s Heterogeneous Mixture
Learning technology.
▌Ryohei Fujimaki (Ph.D.)
▌Research Fellow, NEC Data Science Research Laboratory
 Ryohei is responsible for cores to NEC’s leading predictive and prescriptive
analytics solutions, including "heterogeneous mixture learning” and
“predictive optimization” technologies. In addition to technology R&D,
Ryohei is also involved with co-developing cutting-edge advanced analytics
solutions with NEC’s global business clients and partners in the North
American and APAC regions.
 Ryohei received his Ph.D. degree from the University of Tokyo, and became
the youngest research fellow ever in NEC Corporation’s 115-year history.
3 © NEC Corporation 2016
Agenda
HML
4 © NEC Corporation 2016
Agenda
CPU
12
SEP
11
SEP
10
SEP
HML
5 © NEC Corporation 2016
Agenda
HML
NEC’s Predictive Analytics and
Heterogeneous Mixture Learning
7 © NEC Corporation 2016
Enterprise Applications of HML
Driver Risk
Assessment
Inventory
Optimization
Churn
Retention
Predictive
Maintenance
Product Price
Optimization
Sales
Optimization
Energy/Water
Operation Mgmt.
8 © NEC Corporation 2016
NEC’s Heterogeneous Mixture Learning (HML)
NEC’s machine learning that automatically derives “accurate” and
“transparent” formulas behind Big Data.
Sun
Sat
Y=a1x1+ ・・・ +anxn
Y=a1x1+ ・・・+knx3
Y=b1x2+ ・・・+hnxn
Y=c1x4+ ・・・+fnxn
Mon
Sun
Explore Massive Formulas
Transparent
data segmentation
and predictive formulas
Heterogeneous
mixture data
9 © NEC Corporation 2016
The HML Model
Gender
Height
tallshort
age
smoke
food
weight
sports
sports
tea/coffee
beer/wine
The health risk of a
tall man can be
predicted by age,
alcohol and smoke!
10 © NEC Corporation 2016
HML (Heterogeneous Mixture Learning)
11 © NEC Corporation 2016
HML (Heterogeneous Mixture Learning)
12 © NEC Corporation 2016
HML Algorithm
Segment data by a rule-based tree
Feature
Selection
Pruning
Data
Segmentation
Training data
13 © NEC Corporation 2016
HML Algorithm
Select features and fit predictive models for data segments
Feature
Selection
Pruning
Data
Segmentation
Training data
14 © NEC Corporation 2016
Enterprise Applications of HML
Driver Risk
Assessment
Inventory
Optimization
Churn
Retention
Predictive
Maintenance
Energy/Water
Operation Mgmt.
Sales
Optimization
Product Price
Optimization
15 © NEC Corporation 2016
Driver Risk
Assessment
Inventory
Optimization
Churn
Retention
Predictive
Maintenanc
e
Energy/Water
Operation Mgmt.
Product Price
Optimization
Demand for Big Data Analysis
24(hour)×365(days)×3(year)×1000(shops)
~26,000,000 training samples
Sales
Optimization
16 © NEC Corporation 2016
Driver Risk
Assessment
Inventory
Optimization
Churn
Retention
Predictive
Maintenance
Energy/Water
Demand Forecasting
Sales
Forecasting
Product Price
Optimization
Demand for Big Data Analysis
5 million(customers)×12(months)
=60,000,000 training samples
17 © NEC Corporation 2016
Driver Risk
Assessment
Inventory
Optimization
Churn
Retention
Predictive
Maintenanc
e
Energy/Water
Operation Mgmt.
Product Price
Optimization
Demand for Big Data Analysis
Sales
Optimization
Distributed HML on Spark
19 © NEC Corporation 2016
Why Spark, not Hadoop?
Because Spark’s in-memory architecture can run HML faster
Feature
Selection
Pruning
Data
Segmentation
CPU CPU CPU
Data segmentation Feature selection
~
Pruning
CPU CPU
Data segmentation Feature selection
20 © NEC Corporation 2016
Data Scalability powered by Spark
Treat unlimited scale of training data by adding executors
executor executor executor
CPU CPU CPU
driver
HDFS
Add executors,
and get
more memory
and CPU power
Training data
21 © NEC Corporation 2016
Distributed HML Engine: Architecture
HDFS
Distributed HML
Core (Scala)
YARN
invoke
HML
model
Distributed HML
Interface (Python)
Matrix
computation
libraries
(Breeze, BLAS)
Data Scientist
22 © NEC Corporation 2016
3 Technical Key Points to Fast Run ML on Spark
CPU
12
SEP
11
SEP
10
SEP
23 © NEC Corporation 2016
3 Technical Key Points to Fast Run ML on Spark
CPU
12
SEP
11
SEP
10
SEP
24 © NEC Corporation 2016
Challenge to Avoid Data Shuffling
executor executor executor
driver
Model:
10KB~10MB
HML Model Merge Algorithm
Training data:
TB~
executor executor executor
driver
Naïve Design Distributed HML
25 © NEC Corporation 2016
Execution Flow of Distributed HML
Executors build a rule-based tree from their local data in parallel
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
26 © NEC Corporation 2016
Execution Flow of Distributed HML
Driver merges the trees
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
HML Segment Merge Algorithm
27 © NEC Corporation 2016
Execution Flow of Distributed HML
Driver broadcasts the merged tree to executors
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
28 © NEC Corporation 2016
Execution Flow of Distributed HML
Executors perform feature selection with their local data in parallel
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
Sat
Sat
Sat
Sat
Sat
29 © NEC Corporation 2016
Execution Flow of Distributed HML
Driver merges the results of feature selection
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
Sat
Sat
Sat
Sat
Sat
HML Feature
Merge Algorithm
30 © NEC Corporation 2016
Execution Flow of Distributed HML
Driver merges the results of feature selection
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
Sat
HML Feature
Merge Algorithm
31 © NEC Corporation 2016
Execution Flow of Distributed HML
Driver broadcasts the merged results of feature selection to executors
Driver
Executor
Feature
Selection
Pruning
Data
Segmentation
Sat
SatSatSat
32 © NEC Corporation 2016
3 Technical Key Points to Fast Run ML on Spark
CPU
12
SEP
11
SEP
10
SEP
33 © NEC Corporation 2016
Wait Time for Other Executors Delays Execution
Machine learning is likely to cause unbalanced computation load
Executor
Sat
Sat
Time
Executor Executor
34 © NEC Corporation 2016
Balance Computational Effort for Each Executor
Executor optimizes all predictive formulas with equally-divided training data
Executor
Sat
Sat
Time
Executor Executor
35 © NEC Corporation 2016
Balance Computational Effort for Each Executor
Executor optimizes all predictive formulas with equally-divided training data
Executor
Sat
Sat
Time
Executor Executor
Sat Sat
Sat Sat
Completion time reduced!
36 © NEC Corporation 2016
3 Technical Key Points to Fast Run ML on Spark
1TB
CPU
12
SEP
11
SEP
10
SEP
37 © NEC Corporation 2016
Machine Learning Consists of Many Matrix Computations
Feature
Selection
Pruning
Data
Segmentation
Basic Matrix Computation
Matrix Inversion
Combinatorial Optimization
…
CPU
38 © NEC Corporation 2016
RDD Design for Leveraging High-Speed Libraries
Distributed HML’s RDD Design
Matrix object1
element
…
Straightforward RDD Design
Training sample1
partition
Training sample2
CPU
CPU
Matrix computation libraries
(Breeze, BLAS)
~Training sample1
Training sample2
…
seq(Training sample1,
Training sample2, …)
mapPartition
Matrix object1
…
Training sample1
Training sample2
map
Iterator
Matrix
object
creation
element
39 © NEC Corporation 2016
Performance Degradation by Long-Chained RDD
Long-chained RDD operations cause high-cost recalculations
Feature
Selection
Pruning
Data
Segmentation RDD.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
.map(PruningMapFunc)
40 © NEC Corporation 2016
Performance Degradation by Long-Chained RDD
Long-chained RDD operations cause high-cost recalculations
Feature
Selection
Pruning
Data
Segmentation RDD.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
GC
.map(TreeMapFunc)
CPU
.map(PruningMapFunc)
41 © NEC Corporation 2016
Performance Degradation by Long-Chained RDD
Cut long-chained operations periodically by checkpoint()
Feature
Selection
Pruning
Data
Segmentation RDD.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
.map(TreeMapFunc)
.map(FeatureSelectionMapFunc)
.reduce(TreeReduceFunc)
.reduce(FeatureSelectionReduceFunc)
GC
.map(TreeMapFunc)
savedRDD
.checkpoint()
=
CPU
~
.map(PruningMapFunc)
Benchmark Performance Evaluations
43 © NEC Corporation 2016
Eval 1: Prediction Error (vs. Spark MLlib algorithms)
Distributed HML achieves low error competitive to a complex model
data # samples Distributed
HML
Logistic
Regression
Decision
Tree
Random
Forests
gas sensor
array (CO)*
4,208,261 0.542 0.597 0.587 0.576
household
power
consumption
*
2,075,259 0.524 0.531 0.529 0.655
HIGGS* 11,000,000 0.335 0.358 0.337 0.317
HEPMASS* 7,000,000 0.156 0.163 0.167 0.175
* UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/)
1st
1st
1st
2nd
44 © NEC Corporation 2016
Eval 2: Performance Scalability (vs. Spark MLlib algos)
Distributed HML is competitive with Spark MLlib implementations.
HIGGS* (11M samples) HEPMASS* (7M samples)# CPU cores
Distributed
HML
Logistic
Regression
Distributed
HML
* UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/)
Logistic
Regression
Evaluation in Real Case
46 © NEC Corporation 2016
ATM Cash Balance Prediction
Much cash  High
inventory cost
Little cash  High risk of
out of stock
Around 20,000 ATMs
47 © NEC Corporation 2016
Training Speed
2 hours9 days
▌Summary of data
# ATMs: around 20,000 (in Japan)
# training samples: around 10M
▌Cluster spec (10 nodes)
 # CPU cores: 128
 Memory: 2.5TB
 Spark 1.6.1,
Hadoop 2.7.1 (HDP 2.3.2)
* Run w/ 1 CPU core and 256GB memory
110x
Speed up
Serial HML* Distributed HML
48 © NEC Corporation 2016
Prediction Error
▌Summary of data
# ATMs: around 20,000 (in Japan)
# training samples: around 23M
▌Cluster spec (10 nodes)
 # CPU cores: 128
 Memory: 2.5TB
 Spark 1.6.1,
Hadoop 2.7.1 (HDP 2.3.2)
Average Error:
17% lower
Serial
HML
Distributed HMLSerial
HML
Serial
HML
vs
Winner: Distributed HML
49 © NEC Corporation 2016
Summary
CPU
12
SEP
11
SEP
10
SEP
50 © NEC Corporation 2016
Summary
Distributed Heterogeneous Mixture Learning On Spark

Mais conteúdo relacionado

Mais procurados

Democratizing Data
Democratizing DataDemocratizing Data
Democratizing DataDatabricks
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopDataWorks Summit
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataDatabricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleOperationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleDatabricks
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcpCatherine Kimani
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkDatabricks
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0Databricks
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 

Mais procurados (20)

Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on SparkBig Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
 
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleOperationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At Scale
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcp
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 

Destaque

[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...Insight Technology, Inc.
 
Moestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogstenMoestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogstenThierry Debels
 
Baker HIMSS Staffers Final
Baker HIMSS Staffers FinalBaker HIMSS Staffers Final
Baker HIMSS Staffers Finalbakerdb
 
Ven World Irish Folk Music
Ven World Irish Folk MusicVen World Irish Folk Music
Ven World Irish Folk Musicjglasshouse
 
Future-of-wearable-computing
Future-of-wearable-computingFuture-of-wearable-computing
Future-of-wearable-computingHessel van Tuinen
 
The Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the BrandThe Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the BrandMatthew Thompson
 
If U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - SpeakIf U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - Speakryrota
 
Small and medium-sized busineses' funding
Small and medium-sized busineses' fundingSmall and medium-sized busineses' funding
Small and medium-sized busineses' fundingFrozovsky
 
Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)Milan Korsos
 
Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12ryrota
 
Taklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkiniTaklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkinicukasuam
 
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012crebusproject
 
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]Food Insight
 
A Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav MishchenkoA Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav Mishchenkoguimera
 
Science fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskinScience fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskinUXPA UK
 

Destaque (20)

[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
 
Moestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogstenMoestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogsten
 
Baker HIMSS Staffers Final
Baker HIMSS Staffers FinalBaker HIMSS Staffers Final
Baker HIMSS Staffers Final
 
Ven World Irish Folk Music
Ven World Irish Folk MusicVen World Irish Folk Music
Ven World Irish Folk Music
 
Future-of-wearable-computing
Future-of-wearable-computingFuture-of-wearable-computing
Future-of-wearable-computing
 
The Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the BrandThe Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the Brand
 
If U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - SpeakIf U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - Speak
 
Enterprise 2.0
Enterprise 2.0Enterprise 2.0
Enterprise 2.0
 
Small and medium-sized busineses' funding
Small and medium-sized busineses' fundingSmall and medium-sized busineses' funding
Small and medium-sized busineses' funding
 
Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)
 
Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12
 
FOCUS quotes
FOCUS quotesFOCUS quotes
FOCUS quotes
 
2012 Taiwan UX Summit 專題演講(四)簡報
2012 Taiwan UX Summit 專題演講(四)簡報2012 Taiwan UX Summit 專題演講(四)簡報
2012 Taiwan UX Summit 專題演講(四)簡報
 
Taklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkiniTaklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkini
 
Temporitmika 5 3
Temporitmika 5 3Temporitmika 5 3
Temporitmika 5 3
 
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
 
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
 
A Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav MishchenkoA Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav Mishchenko
 
Kotoba te
Kotoba teKotoba te
Kotoba te
 
Science fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskinScience fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskin
 

Semelhante a Distributed Heterogeneous Mixture Learning On Spark

Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...DataWorks Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
IBM - Craig Bender
IBM - Craig BenderIBM - Craig Bender
IBM - Craig BenderIDGnederland
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaSandesh Rao
 
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdfOpenACC
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 
Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Mahadevan N
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Clarisse Hedglin
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...Fujitsu India
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxHafizSaifullah4
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
 

Semelhante a Distributed Heterogeneous Mixture Learning On Spark (20)

Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
IBM - Craig Bender
IBM - Craig BenderIBM - Craig Bender
IBM - Craig Bender
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
S&OP as a Service.pdf
S&OP as a Service.pdfS&OP as a Service.pdf
S&OP as a Service.pdf
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptx
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 

Mais de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Mais de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Distributed Heterogeneous Mixture Learning On Spark

  • 1. Distributed Heterogeneous Mixture Learning on Spark Masato Asahara and Ryohei Fujimaki NEC Data Science Research Labs. Jun/08/2016 @Spark Summit 2016
  • 2. 2 © NEC Corporation 2016 Who we are? ▌Masato Asahara (Ph.D.) ▌Researcher, NEC Data Science Research Laboratory  Masato received his Ph.D. degree from Keio University, and has worked at NEC for 6 years in the research field of distributed computing systems and computing resource management technologies.  Masato is currently leading developments of Spark-based machine learning and data analytics systems, particularly using NEC’s Heterogeneous Mixture Learning technology. ▌Ryohei Fujimaki (Ph.D.) ▌Research Fellow, NEC Data Science Research Laboratory  Ryohei is responsible for cores to NEC’s leading predictive and prescriptive analytics solutions, including "heterogeneous mixture learning” and “predictive optimization” technologies. In addition to technology R&D, Ryohei is also involved with co-developing cutting-edge advanced analytics solutions with NEC’s global business clients and partners in the North American and APAC regions.  Ryohei received his Ph.D. degree from the University of Tokyo, and became the youngest research fellow ever in NEC Corporation’s 115-year history.
  • 3. 3 © NEC Corporation 2016 Agenda HML
  • 4. 4 © NEC Corporation 2016 Agenda CPU 12 SEP 11 SEP 10 SEP HML
  • 5. 5 © NEC Corporation 2016 Agenda HML
  • 6. NEC’s Predictive Analytics and Heterogeneous Mixture Learning
  • 7. 7 © NEC Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Product Price Optimization Sales Optimization Energy/Water Operation Mgmt.
  • 8. 8 © NEC Corporation 2016 NEC’s Heterogeneous Mixture Learning (HML) NEC’s machine learning that automatically derives “accurate” and “transparent” formulas behind Big Data. Sun Sat Y=a1x1+ ・・・ +anxn Y=a1x1+ ・・・+knx3 Y=b1x2+ ・・・+hnxn Y=c1x4+ ・・・+fnxn Mon Sun Explore Massive Formulas Transparent data segmentation and predictive formulas Heterogeneous mixture data
  • 9. 9 © NEC Corporation 2016 The HML Model Gender Height tallshort age smoke food weight sports sports tea/coffee beer/wine The health risk of a tall man can be predicted by age, alcohol and smoke!
  • 10. 10 © NEC Corporation 2016 HML (Heterogeneous Mixture Learning)
  • 11. 11 © NEC Corporation 2016 HML (Heterogeneous Mixture Learning)
  • 12. 12 © NEC Corporation 2016 HML Algorithm Segment data by a rule-based tree Feature Selection Pruning Data Segmentation Training data
  • 13. 13 © NEC Corporation 2016 HML Algorithm Select features and fit predictive models for data segments Feature Selection Pruning Data Segmentation Training data
  • 14. 14 © NEC Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Operation Mgmt. Sales Optimization Product Price Optimization
  • 15. 15 © NEC Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Operation Mgmt. Product Price Optimization Demand for Big Data Analysis 24(hour)×365(days)×3(year)×1000(shops) ~26,000,000 training samples Sales Optimization
  • 16. 16 © NEC Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Demand Forecasting Sales Forecasting Product Price Optimization Demand for Big Data Analysis 5 million(customers)×12(months) =60,000,000 training samples
  • 17. 17 © NEC Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Operation Mgmt. Product Price Optimization Demand for Big Data Analysis Sales Optimization
  • 19. 19 © NEC Corporation 2016 Why Spark, not Hadoop? Because Spark’s in-memory architecture can run HML faster Feature Selection Pruning Data Segmentation CPU CPU CPU Data segmentation Feature selection ~ Pruning CPU CPU Data segmentation Feature selection
  • 20. 20 © NEC Corporation 2016 Data Scalability powered by Spark Treat unlimited scale of training data by adding executors executor executor executor CPU CPU CPU driver HDFS Add executors, and get more memory and CPU power Training data
  • 21. 21 © NEC Corporation 2016 Distributed HML Engine: Architecture HDFS Distributed HML Core (Scala) YARN invoke HML model Distributed HML Interface (Python) Matrix computation libraries (Breeze, BLAS) Data Scientist
  • 22. 22 © NEC Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
  • 23. 23 © NEC Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
  • 24. 24 © NEC Corporation 2016 Challenge to Avoid Data Shuffling executor executor executor driver Model: 10KB~10MB HML Model Merge Algorithm Training data: TB~ executor executor executor driver Naïve Design Distributed HML
  • 25. 25 © NEC Corporation 2016 Execution Flow of Distributed HML Executors build a rule-based tree from their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation
  • 26. 26 © NEC Corporation 2016 Execution Flow of Distributed HML Driver merges the trees Driver Executor Feature Selection Pruning Data Segmentation HML Segment Merge Algorithm
  • 27. 27 © NEC Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged tree to executors Driver Executor Feature Selection Pruning Data Segmentation
  • 28. 28 © NEC Corporation 2016 Execution Flow of Distributed HML Executors perform feature selection with their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat
  • 29. 29 © NEC Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat HML Feature Merge Algorithm
  • 30. 30 © NEC Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat HML Feature Merge Algorithm
  • 31. 31 © NEC Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged results of feature selection to executors Driver Executor Feature Selection Pruning Data Segmentation Sat SatSatSat
  • 32. 32 © NEC Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
  • 33. 33 © NEC Corporation 2016 Wait Time for Other Executors Delays Execution Machine learning is likely to cause unbalanced computation load Executor Sat Sat Time Executor Executor
  • 34. 34 © NEC Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor
  • 35. 35 © NEC Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor Sat Sat Sat Sat Completion time reduced!
  • 36. 36 © NEC Corporation 2016 3 Technical Key Points to Fast Run ML on Spark 1TB CPU 12 SEP 11 SEP 10 SEP
  • 37. 37 © NEC Corporation 2016 Machine Learning Consists of Many Matrix Computations Feature Selection Pruning Data Segmentation Basic Matrix Computation Matrix Inversion Combinatorial Optimization … CPU
  • 38. 38 © NEC Corporation 2016 RDD Design for Leveraging High-Speed Libraries Distributed HML’s RDD Design Matrix object1 element … Straightforward RDD Design Training sample1 partition Training sample2 CPU CPU Matrix computation libraries (Breeze, BLAS) ~Training sample1 Training sample2 … seq(Training sample1, Training sample2, …) mapPartition Matrix object1 … Training sample1 Training sample2 map Iterator Matrix object creation element
  • 39. 39 © NEC Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(PruningMapFunc)
  • 40. 40 © NEC Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) CPU .map(PruningMapFunc)
  • 41. 41 © NEC Corporation 2016 Performance Degradation by Long-Chained RDD Cut long-chained operations periodically by checkpoint() Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) savedRDD .checkpoint() = CPU ~ .map(PruningMapFunc)
  • 43. 43 © NEC Corporation 2016 Eval 1: Prediction Error (vs. Spark MLlib algorithms) Distributed HML achieves low error competitive to a complex model data # samples Distributed HML Logistic Regression Decision Tree Random Forests gas sensor array (CO)* 4,208,261 0.542 0.597 0.587 0.576 household power consumption * 2,075,259 0.524 0.531 0.529 0.655 HIGGS* 11,000,000 0.335 0.358 0.337 0.317 HEPMASS* 7,000,000 0.156 0.163 0.167 0.175 * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) 1st 1st 1st 2nd
  • 44. 44 © NEC Corporation 2016 Eval 2: Performance Scalability (vs. Spark MLlib algos) Distributed HML is competitive with Spark MLlib implementations. HIGGS* (11M samples) HEPMASS* (7M samples)# CPU cores Distributed HML Logistic Regression Distributed HML * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) Logistic Regression
  • 46. 46 © NEC Corporation 2016 ATM Cash Balance Prediction Much cash  High inventory cost Little cash  High risk of out of stock Around 20,000 ATMs
  • 47. 47 © NEC Corporation 2016 Training Speed 2 hours9 days ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 10M ▌Cluster spec (10 nodes)  # CPU cores: 128  Memory: 2.5TB  Spark 1.6.1, Hadoop 2.7.1 (HDP 2.3.2) * Run w/ 1 CPU core and 256GB memory 110x Speed up Serial HML* Distributed HML
  • 48. 48 © NEC Corporation 2016 Prediction Error ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 23M ▌Cluster spec (10 nodes)  # CPU cores: 128  Memory: 2.5TB  Spark 1.6.1, Hadoop 2.7.1 (HDP 2.3.2) Average Error: 17% lower Serial HML Distributed HMLSerial HML Serial HML vs Winner: Distributed HML
  • 49. 49 © NEC Corporation 2016 Summary CPU 12 SEP 11 SEP 10 SEP
  • 50. 50 © NEC Corporation 2016 Summary