O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Introducing Cloudera Data Science Workbench for HDP 2.12.19

899 visualizações

Publicada em

Cloudera’s Data Science Workbench (CDSW) is available for Hortonworks Data Platform (HDP) clusters for secure, collaborative data science at scale. During this webinar, we provide an introductory tour of CDSW and a demonstration of a machine learning workflow using CDSW on HDP.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Introducing Cloudera Data Science Workbench for HDP 2.12.19

  1. 1. MACHINE LEARNING IN THE ENTERPRISE SAUMITRA BURAGOHAIN | VICE PRESIDENT, PRODUCT MANAGEMENT VIDYA RAMAN | DIRECTOR, PRODUCT MANAGEMENT MICHAEL GREGORY, MACHINE LEARNING FIELD ENGINEERING LEAD
  2. 2. 2 © Cloudera, Inc. All rights reserved. The Industry’s First Enterprise Data Cloud From the Edge to AI Data Engineering Data Warehouse IoT, Ingest & Streaming AI, ML & Data Science Enterprise Data Cloud
  3. 3. 3© Cloudera, Inc. All rights reserved. OUR APPROACH TO MACHINE LEARNING Modern enterprise platform, tools and expert guidance to help you unlock business value with ML/AI Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  4. 4. 4 © Cloudera, Inc. All rights reserved. PLATFORM
  5. 5. 5 © Cloudera, Inc. All rights reserved. Infinitely Scalable (Billions of files, Exabytes) Low TCO (Less Storage Overhead) BIGGER TRUSTED Data Swamp->Data Lake SMARTER Deep Learning frameworks (TensorFlow, Caffe) GPU Pooling/Isolation Faster time to deployment (Containerized Micro-Services) FASTER App 1 App 2 S3, ADLS/WASB, GCS with Truly Incremental Replication HYBRID HDP MARKET DRIVERS
  6. 6. 6 © Cloudera, Inc. All rights reserved. DATA PLATFORM: HDP Enterprise Grade Data Management: Hybrid, Secure and Scalable ● On-Premises and Multi-Cloud ● Enterprise Security & Data Lineage ● Choice of Deep Learning Support (Co-locate TensorFlow/GPU/Data) ● Dockerized for Packaging/Isolation
  7. 7. 7 © Cloudera, Inc. All rights reserved. THE CHALLENGE Balance these needs DATA SCIENCE • Access to granular data • Flexibility • Preferred open source tools • Elastic provisioning • Compute • Storage • Reproducible research • Path to production DATA MANAGEMENT • Security • Governance • Standards • Low maintenance • Low cost • Self-service access
  8. 8. 8 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH
  9. 9. 9 © Cloudera, Inc. All rights reserved. A PLATFORM FOR MACHINE LEARNING • Open platform • Complete lifecycle • Team collaboration • Enterprise ready • Runs anywhere RESEARCH | PRODUCTION LOCAL | SPARK | HIVE DEPLOYMENT COMPUTE OPEN SOURCE ECOSYSTEMALGORITHMS SELF-SERVICE TOOLS SOLUTIONS | USE CASESAPPS CLOUD ON-PREMISES ADLSS3 HDFS KUDU CATALOG | SECURITY | GOVERNANCE SHARED CONTEXT
  10. 10. 10 © Cloudera, Inc. All rights reserved. THE TYPICAL SOLUTION “If I can’t use my favorite tools, I’ll…” • Copy data to my laptop • Copy data to a data science appliance • Copy data to a cloud service Why this is a problem: • Complicates security • Breaks data governance • Adds latency to process • Makes collaboration more difficult • Complicates model management and deployment • Creates infrastructure silos
  11. 11. 11 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production For data scientists • Experiment faster Use R, Python, or Scala with on- demand compute and secure CDH data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  12. 12. 12 © Cloudera, Inc. All rights reserved. A MODERN DATA SCIENCE ARCHITECTURE Containerized environments with scalable, on-demand compute • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Run Spark, Impala, and other CDH services • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages Ranger for Security and Atlas for Governance HDP HDP Ambari gateway node(s) HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  13. 13. 13 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU HDP CPU HDP single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU Available on HDP 3.x GPU
  14. 14. 14 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate and simplify machine learning from research to production ANALYZE DATA • Explore data securely and share insights with the team TRAIN MODELS • Run, track, and compare reproducible experiments DEPLOY APIs • Deploy and monitor models as APIs to serve predictions MANAGE SHARED RESOURCES • Provide a secure, collaborative, self-service platform for your data science teams
  15. 15. 15 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  16. 16. 16 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) 1. Choose file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy! Running model containers also have access to CDH for data lookups.
  17. 17. 17 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  18. 18. 18 © Cloudera, Inc. All rights reserved. DEMO
  19. 19. 19 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH 1.5 on HDP 3.1.0 & 2.6.5 Accelerate and simplify machine learning from research to production in HDP install base
  20. 20. © Cloudera, Inc. All rights reserved. 20 CDSW TRAINING • Self-paced video training • Learn end-to-end ML workflows in CDSW • Code examples • Python and R tracks • Customers can register at university.cloudera.com
  21. 21. THANK YOU
  22. 22. 22 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for any purpose without the express prior written permission of Cloudera. This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any time without notice. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials. The limitation shall not apply in cases of gross negligence.
  23. 23. 23 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT CLOUDERA Our philosophy We empower our customers to run their business on data with an open platform: ● Your data ● Open algorithms ● Running anywhere We accelerate enterprise data science.

×