O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

82 visualizações

Publicada em

HopsML Meetup talk on Hopsworks + ROCm/AMD

Publicada em: Tecnologia
  • Seja o primeiro a comentar

HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

  1. 1. AMD/ROCm for Hopsworks HopsML 4th Meetup, Stockholm June 4th, 2019 jim_dowling CEO @ Logical Clocks Assoc Prof @ KTH robzor92
  2. 2. ©2018 Logical Clocks AB. All Rights Reserved Great Hedge of India 2 •East India Company was one of the industrial world’s first monopolies. •They assembled a thorny hedge (not a wall!) spanning India. •You paid customs duty to bring salt over the wall (sorry, hedge). In 2019, Nvidia GeForce graphics cards are allowed to be used in a Data Center. Monoplies are not good for deep learning! [Image from Wikipedia]
  3. 3. ©2019 Logical Clocks AB. All Rights Reserved Nvidia™ 2080Ti vs AMD Radeon™ VII ResNet-50 Benchmark Nvidia™ 2080Ti Memory: 11GB TensorFlow 1.12 CUDA 10.0.130, cuDNN 7.4.1 Model: RESNET-50 Dataset: imagenet (synthetic) ------------------------------------------------------------ FP32 total images/sec: ~322 FP16 total images/sec: ~560 https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/ https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti- tensorflow&num=2 3 https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/173 AMD Radeon™ VII Memory: 16 GB TensorFlow 1.13.1 ROCm: 2.3 Model: RESNET-50 Dataset: imagenet (synthetic) ------------------------------------------------------------ FP32 total images/sec: ~302 FP16 total images/sec: ~415
  4. 4. ©2019 Logical Clocks AB. All Rights Reserved 4#UnifiedAnalytics #SparkAISummit Latest Machine Learning Frameworks Dockers and Kubernetes support Optimized Math & Communication Libraries Up-Streamed for Linux Kernel Distributions Frameworks Middleware and Libraries Eigen Spark / Machine Learning Apps Data Platform Tools ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA RCCL BLAS, FFT, RNG MIOpen O P E N S O U R C E F O U N D A T I O N F O R M A C H I N E L E A R N I N G A M D M L S O F T W A R E S T R A T E G Y
  5. 5. ©2019 Logical Clocks AB. All Rights Reserved Linux Kernel 4.17 700+ upstream ROCm driver commits since 4.12 kernel https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver Distro: Upstream Linux Kernel Support
  6. 6. ©2019 Logical Clocks AB. All Rights Reserved Programming Models LLVM: https://llvm.org/docs/AMDGPUUsage.html CLANG HIP: https://clang.llvm.org/doxygen/HIP_8h_source.html OpenMP Python OpenCLHIP LLVM -> AMDGCN Compiler AMDGPU Code LLVM Languages: Multiple Programming options
  7. 7. ©2019 Logical Clocks AB. All Rights Reserved ROCm Distributed Training 8#UnifiedAnalytics #SparkAISummit Optimized collective communication operations library Easy MPI integration Support for Infiniband and RoCE highspeed network fabrics ROCm enabled UCX ROCm w/ ROCnRDMA RCCL 1,00X 1,99X 3,98X 7,64X 0,00X 1,00X 2,00X 3,00X 4,00X 5,00X 6,00X 7,00X 8,00X RESNET50 Multi-GPU Scaling (PCIe, CPU parameter- server, 1/2/4/8 GPU) 1GPU 2GPU 4GPU 8GPU ResNet-50
  8. 8. ©2019 Logical Clocks AB. All Rights Reserved ROCm -> Spark / TensorFlow • Spark / TensorFlow applications run unchanged on ROCm • Hopsworks runs Spark/TensorFlow on YARN and Conda 9
  9. 9. ©2019 Logical Clocks AB. All Rights Reserved Container A Container is a CGroup that isolates CPU, memory, and GPU resources and has a conda environment and TLS certs. ContainerContainerContainer YARN support for ROCm in Hops 10#UnifiedAnalytics #SparkAISummit Resource Manager Node Manager Node Manager Node Manager Executor ExecutorExecutorDriver
  10. 10. ©2018 Logical Clocks AB. All Rights Reserved 11 Distributed Deep Learning in Hopsworks HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  11. 11. ©2018 Logical Clocks AB. All Rights Reserved Hyperparameter Optimization 12 # RUNS ON THE EXECUTORS def train(lr, dropout): def input_fn(): # return dataset optimizer = … model = … model.add(Conv2D(…)) model.compile(…) model.fit(…) model.evaluate(…) # RUNS ON THE DRIVER Hparams= {‘lr’:[0.001, 0.0001], ‘dropout’: [0.25, 0.5, 0.75]} experiment.grid_search(train,HParams) https://github.com/logicalclocks/hops-examples HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  12. 12. ©2018 Logical Clocks AB. All Rights Reserved Distributed Training 13 # RUNS ON THE EXECUTORS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig( ‘CollectiveAllReduceStrategy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn) # RUNS ON THE DRIVER experiment.collective_all_reduce(train) https://github.com/logicalclocks/hops-examples HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  13. 13. ©2019 Logical Clocks AB. All Rights Reserved 14#UnifiedAnalytics #SparkAISummit Horizontally Scalable ML Pipelines with Hopsworks Raw Data Event Data Monitor HopsFS Feature Store Serving Feature StoreData PrepIngest DeployExperiment/Train Airflow logs logs Metadata Store
  14. 14. ©2018 Logical Clocks AB. All Rights Reserved ML Pipelines of Jupyter Notebooks 15 Select Features, File Format Feature Engineering Validate & Deploy Model Experiment, Train Model Airflow End-to-End ML Pipeline Feature Backfill Pipeline Training and Deployment Pipeline Feature Store
  15. 15. ©2018 Logical Clocks AB. All Rights Reserved Online Model Serving and Monitoring 16 16 Link Predictions with Outcomes to measure Model Performance Feature Store 2. Build Feature Vector 4. Log Prediction Data Lake Monitor Model Serving Images Model Server Kubernetes 3. Make Prediction Hopsworks Request Response 1. Access Control <<HTTPS>>
  16. 16. Summary •Hopsworks now supports both Nvidia (cuda) and AMD (ROCm) GPUs -Hopsworks 0.10+ •New AMD GPUs will challenge Nvidia’s hegemony in DL -Vega R7 -Navi architecture GPUs coming in July (RX 5700) • 1.25x performance per clock and 1.5x performance per watt • GDDR6 memory and support PCIe 4.0 17/32 https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow
  17. 17. ©2019 Logical Clocks AB. All Rights Reserved 18 @logicalclocks www.logicalclocks.com Try it Out! 1. Register for an account at: www.hops.site robzor92

×