SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Scaling Tensorflow to 100s of GPUs
with Spark and Hops Hadoop
Global AI Conference, Santa Clara, January 18th 2018
Hops
Jim Dowling
Associate Prof @ KTH
Senior Researcher @ RISE SICS
CEO @ Logical Clocks AB
AI Hierarchy of Needs
2
DDL
(Distributed
Deep Learning)
Deep Learning,
RL, Automated ML
A/B Testing, Experimentation, ML
B.I. Analytics, Metrics, Aggregates,
Features, Training/Test Data
Reliable Data Pipelines, ETL, Unstructured and
Structured Data Storage, Real-Time Data Ingestion
[Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ]
AI Hierarchy of Needs
3[Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ]
DDL
(Distributed
Deep Learning)
Deep Learning,
RL, Automated ML
A/B Testing, Experimentation, ML
B.I. Analytics, Metrics, Aggregates,
Features, Training/Test Data
Reliable Data Pipelines, ETL, Unstructured and
Structured Data Storage, Real-Time Data Ingestion
Analytics
Prediction
AI Hierarchy of Needs
4
DDL
(Distributed
Deep Learning)
Deep Learning,
RL, Automated ML
A/B Testing, Experimentation, ML
B.I. Analytics, Metrics, Aggregates,
Features, Training/Test Data
Reliable Data Pipelines, ETL, Unstructured and
Structured Data Storage, Real-Time Data Ingestion
Hops
[Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ]
More Data means Better Predictions
Prediction
Performance
Traditional AI
Deep Neural Nets
Amount Labelled Data
Hand-crafted
can outperform
1980s1990s2000s 2010s 2020s?
What about More Compute?
“Methods that scale with computation
are the future of AI”*
- Rich Sutton (A Founding Father of Reinforcement Learning)
2018-01-19 6/46
* https://www.youtube.com/watch?v=EeMCEQa85tw
More Compute should mean Faster Training
Training
Performance
Single-Host
Distributed
Available Compute
20152016 2017 2018?
Reduce DNN Training Time from 2 weeks to 1 hour
2018-01-19 8/46
In 2017, Facebook
reduced training
time on ImageNet
for a CNN from 2
weeks to 1 hour
by scaling out to
256 GPUs using
Ring-AllReduce on
Caffe2.
https://arxiv.org/abs/1706.02677
DNN Training Time and Researcher Productivity
9
•Distributed Deep Learning
-Interactive analysis!
-Instant gratification!
•Single Host Deep Learning
• Suffer from Google-Envy
“My Model’s Training.”
Training
Distributed Training: Theory and Practice
Image from @hardmaru on Twitter.
10
Distributed Algorithms are not all Created Equal
Training
Performance
Parameter Servers
AllReduce
Available Compute
Ring-AllReduce vs Parameter Server(s)
2018-01-19 13/46
GPU 0
GPU 1
GPU 2
GPU 3
send
send
send
send
recv
recv
recv
recv GPU 1 GPU 2 GPU 3 GPU 4
Param Server(s)
Network Bandwidth is the Bottleneck for Distributed Training
AllReduce outperforms Parameter Servers
2018-01-19 14/46
*https://github.com/uber/horovod
16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network
(synthetic data). Speed below is images processed per second.*
For Bigger Models, Parameter Servers don’t scale
Multiple GPUs on a Single Server
2018-01-19 15/46
NVLink vs PCI-E Single Root Complex
2018-01-19 16/46On Single-Host (dist. Training), the Bus can be the Bottleneck
[Images from: https://www.microway.com/product/octoputer-4u-10-gpu-server-single-root-complex/ ]
NVLink – 80 GB/s PCI-E – 16 GB/s
Scale: Remove Bus and Net B/W Bottlenecks
2018-01-19 17/46
Only one slow worker or bus or n/w link is needed to bottleneck DNN training time.
Ring-AllReduce
The Cloud is full of Bottlenecks….
Training
Performance
Public Cloud (10 GbE)
Infiniband On-Premise
Available Compute
Deep Learning Hierarchy of Scale
2018-01-19 19/46
DDL
AllReduce
on GPU Servers
DDL with GPU Servers
and Parameter Servers
Parallel Experiments on GPU Servers
Single GPU
Many GPUs on a Single GPU Server
Days/Hours
Days
Weeks
Minutes
Training Time for ImageNet
Hours
Lots of good GPUs > A few great GPUs
Hops
100 x Nvidia 1080Ti (DeepLearning11)
8 x Nvidia P/V100 (DGX-1)
VS
Both top (100 GPUs) and bottom (8 GPUs) cost the same: $150K (2017).
Consumer GPU Server $15K (10 x 1080Ti)
2018-01-19 https://www.oreilly.com/ideas/distributed-tensorflow 21/46
Cluster of Commodity GPU Servers
#EUai8
22
InfiniBan
Max 1-2 GPU Servers per Rack (2-4 KW per server)
#EUai8
TensorFlow Spark Platforms
•TensorFlow-on-Spark
•Deep Learning Pipelines
•Horovod
•Hops
23
Hops – Running Parallel Experiments
def model_fn(learning_rate, dropout):
import tensorflow as tf
from hops import tensorboard, hdfs, devices
…..
from hops import tflauncher
args_dict = {'learning_rate': [0.001], 'dropout': [0.5]}
tflauncher.launch(spark, model_fn, args_dict)
24
Launch TF jobs as Mappers in Spark
“Pure” TensorFlow code
in the Executor
Hops – Parallel Experiments
25
#EUai8
def model_fn(learning_rate, dropout):
…..
from hops import tflauncher
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
tflauncher.launch(spark, model_fn, args_dict)
Launches 6 Executors with with a different Hyperparameter
combination. Each Executor can have 1-N GPUs.
Hops AllReduce/Horovod/TensorFlow
27
#EUai8
import horovod.tensorflow as hvd
def conv_model(feature, target, mode)
…..
def main(_):
hvd.init()
opt = hvd.DistributedOptimizer(opt)
if hvd.local_rank()==0:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
else:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
from hops import allreduce
allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb')
“Pure” TensorFlow code
TensorFlow and Hops Hadoop
2018-01-19 28/46
Don’t do this: Different Clusters for Big Data and ML
29
Hops: Single ML and Big Data Cluster
30/70
IT
DataLake
GPUs Compute
Kafka
Data EngineeringData Science
Project1 ProjectN
Elasticsearch
HopsFS: Next Generation HDFS*
16x
Throughput
FasterBigger
*https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
37x
Number of files
Scale Challenge Winner (2017)
31
Size Matters: Improving the Performance of Small Files in HDFS. Salman Niazi, Seif Haridi, Jim Dowling. Poster, EuroSys 2017.
`
HopsFS now stores Small Files in the DB
GPUs supported as a Resource in Hops 2.8.2*
33
Hops is the only Hadoop distribution to support GPUs-as-a-Resource.
*Robin Andersson, GPU Integration for Deep Learning on YARN, MSc Thesis, 2017
GPU Resource Requests in Hops
34
HopsYARN
4 GPUs on any host
10 GPUs on 1 host
100 GPUs on 10 hosts with ‘Infiniband’
20 GPUs on 2 hosts with ‘Infiniband_P100’
HopsFS
Mix of commodity GPUs and more
powerful GPUs good for (1) parallel
experiments and (2) distributed training
Hopsworks Data Platform
35
Develop Train Test Serve
MySQL Cluster
Hive
InfluxDB
ElasticSearch
KafkaProjects,Datasets,Users
HopsFS / YARN
Spark, Flink, Tensorflow
Jupyter, Zeppelin
Jobs, Kibana, Grafana
REST
API
Hopsworks
Python is a First-Class Citizen in Hopsworks
www.hops.io 36
Custom Python Environments with Conda
Python libraries are usable by Spark/Tensorflow
37
What is Hopsworks used for?
2018-01-19 38/46
HopsFS
YARN
Public Cloud or On-Premise
Parquet
ETL Workloads
39
Hive
Hopsworks
Jobs trigger
HopsFS
YARN
Public Cloud or On-Premise
Parquet
Business Intelligence Workloads
40
Hive
Jupyter/Zeppelin
or Jobs
Kibana
reports
Zeppelin
HopsFS
YARN
Grafana/
InfluxDB
Elastic/
Kibana
Public Cloud or On-Premise
Parquet
Data Src
Batch Analytics
Kafka
…...MySQL
Streaming Analytics in Hopsworks
41
Hive
HopsFS
YARN
FeatureStore
Tensorflow
Serving
Public Cloud or On-Premise
Tensorboard
TensorFlow in Hopsworks
42
Experiments
Kafka
Hive
One Click Deployment of TensorFlow Models
Hops API
•Java/Scala library
- Secure Streaming Analytics with Kafka/Spark/Flink
• SSL/TLS certs, Avro Schema, Endpoints for Kafka/Hopsworks/etc
•Python Library
- Managing tensorboard, Load/save models in HopsFS
- Distributed Tensorflow in Python
- Parameter sweeps for parallel experiments
TensorFlow-as-a-Service in RISE SICS ICE
• Hops
• Spark/Flink/Kafka/TensorFlow/Hadoop-as-a-service
www.hops.site
• RISE SICS ICE
• 250 kW Datacenter, ~400 servers
• Research and test environment
https://www.sics.se/projects/sics-ice-data-center-in-lulea 45
Summary
•Distribution can make Deep Learning practitioners
more productive.
https://www.oreilly.com/ideas/distributed-tensorflow
•Hopsworks is a new Data Platform built on HopsFS
with first-class support for Python and Deep
Learning / ML
- Tensorflow / Spark
The Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso,
Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al
Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka,
Tobias Johansson , Roberto Bampi.
www.hops.io
@hopshadoop
Thank You.
Follow us: @hopshadoop
Star us: http://github.com/hopshadoop/hopsworks
Join us: http://www.hops.io
Thank You.
Follow us: @hopshadoop
Star us: http://github.com/hopshadoop/hopsworks
Join us: http://www.hops.io
Hops

Mais conteúdo relacionado

Mais procurados

Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and DatasetKazuaki Ishizaki
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangDatabricks
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaBrendan Gregg
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 

Mais procurados (20)

Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 

Semelhante a Scaling TensorFlow with Hops, Global AI Conference Santa Clara

Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 

Semelhante a Scaling TensorFlow with Hops, Global AI Conference Santa Clara (20)

Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 

Mais de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 

Mais de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Scaling TensorFlow with Hops, Global AI Conference Santa Clara

  • 1. Scaling Tensorflow to 100s of GPUs with Spark and Hops Hadoop Global AI Conference, Santa Clara, January 18th 2018 Hops Jim Dowling Associate Prof @ KTH Senior Researcher @ RISE SICS CEO @ Logical Clocks AB
  • 2. AI Hierarchy of Needs 2 DDL (Distributed Deep Learning) Deep Learning, RL, Automated ML A/B Testing, Experimentation, ML B.I. Analytics, Metrics, Aggregates, Features, Training/Test Data Reliable Data Pipelines, ETL, Unstructured and Structured Data Storage, Real-Time Data Ingestion [Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ]
  • 3. AI Hierarchy of Needs 3[Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ] DDL (Distributed Deep Learning) Deep Learning, RL, Automated ML A/B Testing, Experimentation, ML B.I. Analytics, Metrics, Aggregates, Features, Training/Test Data Reliable Data Pipelines, ETL, Unstructured and Structured Data Storage, Real-Time Data Ingestion Analytics Prediction
  • 4. AI Hierarchy of Needs 4 DDL (Distributed Deep Learning) Deep Learning, RL, Automated ML A/B Testing, Experimentation, ML B.I. Analytics, Metrics, Aggregates, Features, Training/Test Data Reliable Data Pipelines, ETL, Unstructured and Structured Data Storage, Real-Time Data Ingestion Hops [Adapted from https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007?gi=7e13a696e469 ]
  • 5. More Data means Better Predictions Prediction Performance Traditional AI Deep Neural Nets Amount Labelled Data Hand-crafted can outperform 1980s1990s2000s 2010s 2020s?
  • 6. What about More Compute? “Methods that scale with computation are the future of AI”* - Rich Sutton (A Founding Father of Reinforcement Learning) 2018-01-19 6/46 * https://www.youtube.com/watch?v=EeMCEQa85tw
  • 7. More Compute should mean Faster Training Training Performance Single-Host Distributed Available Compute 20152016 2017 2018?
  • 8. Reduce DNN Training Time from 2 weeks to 1 hour 2018-01-19 8/46 In 2017, Facebook reduced training time on ImageNet for a CNN from 2 weeks to 1 hour by scaling out to 256 GPUs using Ring-AllReduce on Caffe2. https://arxiv.org/abs/1706.02677
  • 9. DNN Training Time and Researcher Productivity 9 •Distributed Deep Learning -Interactive analysis! -Instant gratification! •Single Host Deep Learning • Suffer from Google-Envy “My Model’s Training.” Training
  • 10. Distributed Training: Theory and Practice Image from @hardmaru on Twitter. 10
  • 11. Distributed Algorithms are not all Created Equal Training Performance Parameter Servers AllReduce Available Compute
  • 12. Ring-AllReduce vs Parameter Server(s) 2018-01-19 13/46 GPU 0 GPU 1 GPU 2 GPU 3 send send send send recv recv recv recv GPU 1 GPU 2 GPU 3 GPU 4 Param Server(s) Network Bandwidth is the Bottleneck for Distributed Training
  • 13. AllReduce outperforms Parameter Servers 2018-01-19 14/46 *https://github.com/uber/horovod 16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network (synthetic data). Speed below is images processed per second.* For Bigger Models, Parameter Servers don’t scale
  • 14. Multiple GPUs on a Single Server 2018-01-19 15/46
  • 15. NVLink vs PCI-E Single Root Complex 2018-01-19 16/46On Single-Host (dist. Training), the Bus can be the Bottleneck [Images from: https://www.microway.com/product/octoputer-4u-10-gpu-server-single-root-complex/ ] NVLink – 80 GB/s PCI-E – 16 GB/s
  • 16. Scale: Remove Bus and Net B/W Bottlenecks 2018-01-19 17/46 Only one slow worker or bus or n/w link is needed to bottleneck DNN training time. Ring-AllReduce
  • 17. The Cloud is full of Bottlenecks…. Training Performance Public Cloud (10 GbE) Infiniband On-Premise Available Compute
  • 18. Deep Learning Hierarchy of Scale 2018-01-19 19/46 DDL AllReduce on GPU Servers DDL with GPU Servers and Parameter Servers Parallel Experiments on GPU Servers Single GPU Many GPUs on a Single GPU Server Days/Hours Days Weeks Minutes Training Time for ImageNet Hours
  • 19. Lots of good GPUs > A few great GPUs Hops 100 x Nvidia 1080Ti (DeepLearning11) 8 x Nvidia P/V100 (DGX-1) VS Both top (100 GPUs) and bottom (8 GPUs) cost the same: $150K (2017).
  • 20. Consumer GPU Server $15K (10 x 1080Ti) 2018-01-19 https://www.oreilly.com/ideas/distributed-tensorflow 21/46
  • 21. Cluster of Commodity GPU Servers #EUai8 22 InfiniBan Max 1-2 GPU Servers per Rack (2-4 KW per server)
  • 22. #EUai8 TensorFlow Spark Platforms •TensorFlow-on-Spark •Deep Learning Pipelines •Horovod •Hops 23
  • 23. Hops – Running Parallel Experiments def model_fn(learning_rate, dropout): import tensorflow as tf from hops import tensorboard, hdfs, devices ….. from hops import tflauncher args_dict = {'learning_rate': [0.001], 'dropout': [0.5]} tflauncher.launch(spark, model_fn, args_dict) 24 Launch TF jobs as Mappers in Spark “Pure” TensorFlow code in the Executor
  • 24. Hops – Parallel Experiments 25 #EUai8 def model_fn(learning_rate, dropout): ….. from hops import tflauncher args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} tflauncher.launch(spark, model_fn, args_dict) Launches 6 Executors with with a different Hyperparameter combination. Each Executor can have 1-N GPUs.
  • 25. Hops AllReduce/Horovod/TensorFlow 27 #EUai8 import horovod.tensorflow as hvd def conv_model(feature, target, mode) ….. def main(_): hvd.init() opt = hvd.DistributedOptimizer(opt) if hvd.local_rank()==0: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. else: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. from hops import allreduce allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb') “Pure” TensorFlow code
  • 26. TensorFlow and Hops Hadoop 2018-01-19 28/46
  • 27. Don’t do this: Different Clusters for Big Data and ML 29
  • 28. Hops: Single ML and Big Data Cluster 30/70 IT DataLake GPUs Compute Kafka Data EngineeringData Science Project1 ProjectN Elasticsearch
  • 29. HopsFS: Next Generation HDFS* 16x Throughput FasterBigger *https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi 37x Number of files Scale Challenge Winner (2017) 31
  • 30. Size Matters: Improving the Performance of Small Files in HDFS. Salman Niazi, Seif Haridi, Jim Dowling. Poster, EuroSys 2017. ` HopsFS now stores Small Files in the DB
  • 31. GPUs supported as a Resource in Hops 2.8.2* 33 Hops is the only Hadoop distribution to support GPUs-as-a-Resource. *Robin Andersson, GPU Integration for Deep Learning on YARN, MSc Thesis, 2017
  • 32. GPU Resource Requests in Hops 34 HopsYARN 4 GPUs on any host 10 GPUs on 1 host 100 GPUs on 10 hosts with ‘Infiniband’ 20 GPUs on 2 hosts with ‘Infiniband_P100’ HopsFS Mix of commodity GPUs and more powerful GPUs good for (1) parallel experiments and (2) distributed training
  • 33. Hopsworks Data Platform 35 Develop Train Test Serve MySQL Cluster Hive InfluxDB ElasticSearch KafkaProjects,Datasets,Users HopsFS / YARN Spark, Flink, Tensorflow Jupyter, Zeppelin Jobs, Kibana, Grafana REST API Hopsworks
  • 34. Python is a First-Class Citizen in Hopsworks www.hops.io 36
  • 35. Custom Python Environments with Conda Python libraries are usable by Spark/Tensorflow 37
  • 36. What is Hopsworks used for? 2018-01-19 38/46
  • 37. HopsFS YARN Public Cloud or On-Premise Parquet ETL Workloads 39 Hive Hopsworks Jobs trigger
  • 38. HopsFS YARN Public Cloud or On-Premise Parquet Business Intelligence Workloads 40 Hive Jupyter/Zeppelin or Jobs Kibana reports Zeppelin
  • 39. HopsFS YARN Grafana/ InfluxDB Elastic/ Kibana Public Cloud or On-Premise Parquet Data Src Batch Analytics Kafka …...MySQL Streaming Analytics in Hopsworks 41 Hive
  • 40. HopsFS YARN FeatureStore Tensorflow Serving Public Cloud or On-Premise Tensorboard TensorFlow in Hopsworks 42 Experiments Kafka Hive
  • 41. One Click Deployment of TensorFlow Models
  • 42. Hops API •Java/Scala library - Secure Streaming Analytics with Kafka/Spark/Flink • SSL/TLS certs, Avro Schema, Endpoints for Kafka/Hopsworks/etc •Python Library - Managing tensorboard, Load/save models in HopsFS - Distributed Tensorflow in Python - Parameter sweeps for parallel experiments
  • 43. TensorFlow-as-a-Service in RISE SICS ICE • Hops • Spark/Flink/Kafka/TensorFlow/Hadoop-as-a-service www.hops.site • RISE SICS ICE • 250 kW Datacenter, ~400 servers • Research and test environment https://www.sics.se/projects/sics-ice-data-center-in-lulea 45
  • 44. Summary •Distribution can make Deep Learning practitioners more productive. https://www.oreilly.com/ideas/distributed-tensorflow •Hopsworks is a new Data Platform built on HopsFS with first-class support for Python and Deep Learning / ML - Tensorflow / Spark
  • 45. The Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka, Tobias Johansson , Roberto Bampi. www.hops.io @hopshadoop
  • 46. Thank You. Follow us: @hopshadoop Star us: http://github.com/hopshadoop/hopsworks Join us: http://www.hops.io Thank You. Follow us: @hopshadoop Star us: http://github.com/hopshadoop/hopsworks Join us: http://www.hops.io Hops