Enviar pesquisa
Carregar
Distributed Heterogeneous Mixture Learning On Spark
•
2 gostaram
•
1,379 visualizações
Spark Summit
Seguir
Spark Summit 2016 talk by Masato Asahara (NEC) and Ryohei Fujimaki (NEC)
Leia menos
Leia mais
Dados e análise
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 51
Baixar agora
Baixar para ler offline
Recomendados
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Spark Summit
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
Recomendados
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
Spark Summit
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Spark Summit
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
Democratizing Data
Democratizing Data
Databricks
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Databricks
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
Databricks
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At Scale
Databricks
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
Databricks
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
Databricks
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
Databricks
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
Insight Technology, Inc.
Moestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogsten
Thierry Debels
Mais conteúdo relacionado
Mais procurados
Democratizing Data
Democratizing Data
Databricks
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Databricks
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
Databricks
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At Scale
Databricks
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
Databricks
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
Databricks
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
Databricks
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
Mais procurados
(20)
Democratizing Data
Democratizing Data
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At Scale
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Serverless data pipelines gcp
Serverless data pipelines gcp
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Destaque
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
Insight Technology, Inc.
Moestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogsten
Thierry Debels
Baker HIMSS Staffers Final
Baker HIMSS Staffers Final
bakerdb
Ven World Irish Folk Music
Ven World Irish Folk Music
jglasshouse
Future-of-wearable-computing
Future-of-wearable-computing
Hessel van Tuinen
The Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the Brand
Matthew Thompson
If U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - Speak
ryrota
Enterprise 2.0
Enterprise 2.0
Kyle Mathews
Small and medium-sized busineses' funding
Small and medium-sized busineses' funding
Frozovsky
Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)
Milan Korsos
Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12
ryrota
FOCUS quotes
FOCUS quotes
Anna Zubarev
2012 Taiwan UX Summit 專題演講(四)簡報
2012 Taiwan UX Summit 專題演講(四)簡報
UXTW(Taiwan User Experience Professional Association)
Taklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkini
cukasuam
Temporitmika 5 3
Temporitmika 5 3
Владимир Белоконь
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
crebusproject
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Food Insight
A Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav Mishchenko
guimera
Kotoba te
Kotoba te
ラッキー ね
Science fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskin
UXPA UK
Destaque
(20)
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
[db tech showcase Tokyo 2016] B31: Spark Summit 2016@SFに参加してきたので最新事例などを紹介しつつデ...
Moestuinier Wim Lybaert kan ook winst oogsten
Moestuinier Wim Lybaert kan ook winst oogsten
Baker HIMSS Staffers Final
Baker HIMSS Staffers Final
Ven World Irish Folk Music
Ven World Irish Folk Music
Future-of-wearable-computing
Future-of-wearable-computing
The Brand Is Dead: Long Live the Brand
The Brand Is Dead: Long Live the Brand
If U Dont Want To Be Ill - Speak
If U Dont Want To Be Ill - Speak
Enterprise 2.0
Enterprise 2.0
Small and medium-sized busineses' funding
Small and medium-sized busineses' funding
Digitalis Design - Mobile App Design (hun)
Digitalis Design - Mobile App Design (hun)
Sabbath School Lesson - March 6-12
Sabbath School Lesson - March 6-12
FOCUS quotes
FOCUS quotes
2012 Taiwan UX Summit 專題演講(四)簡報
2012 Taiwan UX Summit 專題演講(四)簡報
Taklimat r trg draf 4 tkini
Taklimat r trg draf 4 tkini
Temporitmika 5 3
Temporitmika 5 3
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Entrepreneurial opportunities nicoleta litoiu upb_crebus 2012
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
Yogurt: Pick This Multipurpose Snack and Meal [INFOGRAPHIC]
A Magical Miniature World By Vyacheslav Mishchenko
A Magical Miniature World By Vyacheslav Mishchenko
Kotoba te
Kotoba te
Science fiction in films and how we design the future - Matthew McGriskin
Science fiction in films and how we design the future - Matthew McGriskin
Semelhante a Distributed Heterogeneous Mixture Learning On Spark
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
DataWorks Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Seldon
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Ahsan Javed Awan
IBM - Craig Bender
IBM - Craig Bender
IDGnederland
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
Sandesh Rao
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Meetup Spark UDF performance
Meetup Spark UDF performance
Guilherme Braccialli
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
S&OP as a Service.pdf
S&OP as a Service.pdf
David Barbieri Kennedy
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)
Mahadevan N
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Databricks
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
Fujitsu India
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
SingleStore
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptx
HafizSaifullah4
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Semelhante a Distributed Heterogeneous Mixture Learning On Spark
(20)
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
IBM - Craig Bender
IBM - Craig Bender
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
Data Science Crash Course
Data Science Crash Course
Meetup Spark UDF performance
Meetup Spark UDF performance
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
S&OP as a Service.pdf
S&OP as a Service.pdf
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Resume_Mahadevan_new (2)
Resume_Mahadevan_new (2)
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
K5.Fujitsu World Tour 2016-Winning with NetApp in Digital Transformation Age,...
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptx
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Mais de Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
Mais de Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Último
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
SubhamKumar3239
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Boston Institute of Analytics
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Boston Institute of Analytics
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
Susanna-Assunta Sansone
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Cathrine Wilhelmsen
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
deepakthakur548787
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Boston Institute of Analytics
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Seán Kennedy
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
HaritikaChhatwal1
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Amil Baba Dawood bangali
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
blazblazml
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Eduminds Learning
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
Tasha Penwell
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
TecnoIncentive
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
17djon017
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
KarteekMane1
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
aleedritatuxx
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Boston Institute of Analytics
Último
(20)
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Distributed Heterogeneous Mixture Learning On Spark
1.
Distributed Heterogeneous Mixture Learning
on Spark Masato Asahara and Ryohei Fujimaki NEC Data Science Research Labs. Jun/08/2016 @Spark Summit 2016
2.
2 © NEC
Corporation 2016 Who we are? ▌Masato Asahara (Ph.D.) ▌Researcher, NEC Data Science Research Laboratory Masato received his Ph.D. degree from Keio University, and has worked at NEC for 6 years in the research field of distributed computing systems and computing resource management technologies. Masato is currently leading developments of Spark-based machine learning and data analytics systems, particularly using NEC’s Heterogeneous Mixture Learning technology. ▌Ryohei Fujimaki (Ph.D.) ▌Research Fellow, NEC Data Science Research Laboratory Ryohei is responsible for cores to NEC’s leading predictive and prescriptive analytics solutions, including "heterogeneous mixture learning” and “predictive optimization” technologies. In addition to technology R&D, Ryohei is also involved with co-developing cutting-edge advanced analytics solutions with NEC’s global business clients and partners in the North American and APAC regions. Ryohei received his Ph.D. degree from the University of Tokyo, and became the youngest research fellow ever in NEC Corporation’s 115-year history.
3.
3 © NEC
Corporation 2016 Agenda HML
4.
4 © NEC
Corporation 2016 Agenda CPU 12 SEP 11 SEP 10 SEP HML
5.
5 © NEC
Corporation 2016 Agenda HML
6.
NEC’s Predictive Analytics
and Heterogeneous Mixture Learning
7.
7 © NEC
Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Product Price Optimization Sales Optimization Energy/Water Operation Mgmt.
8.
8 © NEC
Corporation 2016 NEC’s Heterogeneous Mixture Learning (HML) NEC’s machine learning that automatically derives “accurate” and “transparent” formulas behind Big Data. Sun Sat Y=a1x1+ ・・・ +anxn Y=a1x1+ ・・・+knx3 Y=b1x2+ ・・・+hnxn Y=c1x4+ ・・・+fnxn Mon Sun Explore Massive Formulas Transparent data segmentation and predictive formulas Heterogeneous mixture data
9.
9 © NEC
Corporation 2016 The HML Model Gender Height tallshort age smoke food weight sports sports tea/coffee beer/wine The health risk of a tall man can be predicted by age, alcohol and smoke!
10.
10 © NEC
Corporation 2016 HML (Heterogeneous Mixture Learning)
11.
11 © NEC
Corporation 2016 HML (Heterogeneous Mixture Learning)
12.
12 © NEC
Corporation 2016 HML Algorithm Segment data by a rule-based tree Feature Selection Pruning Data Segmentation Training data
13.
13 © NEC
Corporation 2016 HML Algorithm Select features and fit predictive models for data segments Feature Selection Pruning Data Segmentation Training data
14.
14 © NEC
Corporation 2016 Enterprise Applications of HML Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Operation Mgmt. Sales Optimization Product Price Optimization
15.
15 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Operation Mgmt. Product Price Optimization Demand for Big Data Analysis 24(hour)×365(days)×3(year)×1000(shops) ~26,000,000 training samples Sales Optimization
16.
16 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenance Energy/Water Demand Forecasting Sales Forecasting Product Price Optimization Demand for Big Data Analysis 5 million(customers)×12(months) =60,000,000 training samples
17.
17 © NEC
Corporation 2016 Driver Risk Assessment Inventory Optimization Churn Retention Predictive Maintenanc e Energy/Water Operation Mgmt. Product Price Optimization Demand for Big Data Analysis Sales Optimization
18.
Distributed HML on
Spark
19.
19 © NEC
Corporation 2016 Why Spark, not Hadoop? Because Spark’s in-memory architecture can run HML faster Feature Selection Pruning Data Segmentation CPU CPU CPU Data segmentation Feature selection ~ Pruning CPU CPU Data segmentation Feature selection
20.
20 © NEC
Corporation 2016 Data Scalability powered by Spark Treat unlimited scale of training data by adding executors executor executor executor CPU CPU CPU driver HDFS Add executors, and get more memory and CPU power Training data
21.
21 © NEC
Corporation 2016 Distributed HML Engine: Architecture HDFS Distributed HML Core (Scala) YARN invoke HML model Distributed HML Interface (Python) Matrix computation libraries (Breeze, BLAS) Data Scientist
22.
22 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
23.
23 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
24.
24 © NEC
Corporation 2016 Challenge to Avoid Data Shuffling executor executor executor driver Model: 10KB~10MB HML Model Merge Algorithm Training data: TB~ executor executor executor driver Naïve Design Distributed HML
25.
25 © NEC
Corporation 2016 Execution Flow of Distributed HML Executors build a rule-based tree from their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation
26.
26 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the trees Driver Executor Feature Selection Pruning Data Segmentation HML Segment Merge Algorithm
27.
27 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged tree to executors Driver Executor Feature Selection Pruning Data Segmentation
28.
28 © NEC
Corporation 2016 Execution Flow of Distributed HML Executors perform feature selection with their local data in parallel Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat
29.
29 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat Sat Sat Sat Sat HML Feature Merge Algorithm
30.
30 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver merges the results of feature selection Driver Executor Feature Selection Pruning Data Segmentation Sat HML Feature Merge Algorithm
31.
31 © NEC
Corporation 2016 Execution Flow of Distributed HML Driver broadcasts the merged results of feature selection to executors Driver Executor Feature Selection Pruning Data Segmentation Sat SatSatSat
32.
32 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark CPU 12 SEP 11 SEP 10 SEP
33.
33 © NEC
Corporation 2016 Wait Time for Other Executors Delays Execution Machine learning is likely to cause unbalanced computation load Executor Sat Sat Time Executor Executor
34.
34 © NEC
Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor
35.
35 © NEC
Corporation 2016 Balance Computational Effort for Each Executor Executor optimizes all predictive formulas with equally-divided training data Executor Sat Sat Time Executor Executor Sat Sat Sat Sat Completion time reduced!
36.
36 © NEC
Corporation 2016 3 Technical Key Points to Fast Run ML on Spark 1TB CPU 12 SEP 11 SEP 10 SEP
37.
37 © NEC
Corporation 2016 Machine Learning Consists of Many Matrix Computations Feature Selection Pruning Data Segmentation Basic Matrix Computation Matrix Inversion Combinatorial Optimization … CPU
38.
38 © NEC
Corporation 2016 RDD Design for Leveraging High-Speed Libraries Distributed HML’s RDD Design Matrix object1 element … Straightforward RDD Design Training sample1 partition Training sample2 CPU CPU Matrix computation libraries (Breeze, BLAS) ~Training sample1 Training sample2 … seq(Training sample1, Training sample2, …) mapPartition Matrix object1 … Training sample1 Training sample2 map Iterator Matrix object creation element
39.
39 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(PruningMapFunc)
40.
40 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Long-chained RDD operations cause high-cost recalculations Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) CPU .map(PruningMapFunc)
41.
41 © NEC
Corporation 2016 Performance Degradation by Long-Chained RDD Cut long-chained operations periodically by checkpoint() Feature Selection Pruning Data Segmentation RDD.map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) .map(TreeMapFunc) .map(FeatureSelectionMapFunc) .reduce(TreeReduceFunc) .reduce(FeatureSelectionReduceFunc) GC .map(TreeMapFunc) savedRDD .checkpoint() = CPU ~ .map(PruningMapFunc)
42.
Benchmark Performance Evaluations
43.
43 © NEC
Corporation 2016 Eval 1: Prediction Error (vs. Spark MLlib algorithms) Distributed HML achieves low error competitive to a complex model data # samples Distributed HML Logistic Regression Decision Tree Random Forests gas sensor array (CO)* 4,208,261 0.542 0.597 0.587 0.576 household power consumption * 2,075,259 0.524 0.531 0.529 0.655 HIGGS* 11,000,000 0.335 0.358 0.337 0.317 HEPMASS* 7,000,000 0.156 0.163 0.167 0.175 * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) 1st 1st 1st 2nd
44.
44 © NEC
Corporation 2016 Eval 2: Performance Scalability (vs. Spark MLlib algos) Distributed HML is competitive with Spark MLlib implementations. HIGGS* (11M samples) HEPMASS* (7M samples)# CPU cores Distributed HML Logistic Regression Distributed HML * UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) Logistic Regression
45.
Evaluation in Real
Case
46.
46 © NEC
Corporation 2016 ATM Cash Balance Prediction Much cash High inventory cost Little cash High risk of out of stock Around 20,000 ATMs
47.
47 © NEC
Corporation 2016 Training Speed 2 hours9 days ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 10M ▌Cluster spec (10 nodes) # CPU cores: 128 Memory: 2.5TB Spark 1.6.1, Hadoop 2.7.1 (HDP 2.3.2) * Run w/ 1 CPU core and 256GB memory 110x Speed up Serial HML* Distributed HML
48.
48 © NEC
Corporation 2016 Prediction Error ▌Summary of data # ATMs: around 20,000 (in Japan) # training samples: around 23M ▌Cluster spec (10 nodes) # CPU cores: 128 Memory: 2.5TB Spark 1.6.1, Hadoop 2.7.1 (HDP 2.3.2) Average Error: 17% lower Serial HML Distributed HMLSerial HML Serial HML vs Winner: Distributed HML
49.
49 © NEC
Corporation 2016 Summary CPU 12 SEP 11 SEP 10 SEP
50.
50 © NEC
Corporation 2016 Summary
Baixar agora