O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

DPGD Microsoft Hyderabad 22nd Sept 2018

173 visualizações

Publicada em

Introduction to machine learning on Azure data bricks environment gives you fair idea to kick start ML with the best tools and libraries available. It Covers:
Why Machine learning
Life Cycle
Why Azure Databricks
Available ML options in Azure Databricks
MLFlow
H20.ai

Publicada em: Dados e análise
  • Seja o primeiro a comentar

DPGD Microsoft Hyderabad 22nd Sept 2018

  1. 1. INTRODUCTION • SOFTWARE ENGINEER WORKING WITH MICROSOFT • 12 YEARS OF EXPERIENCE WORKING ON SQL, DWH,BI ,AZURE AND BIG DATA TECHNOLOGIES LIKE SPARK, DATABRICKS, HDINSIGHT, IOT ETC. • CATCH ME ON LINKEDIN OR VIA MY BLOG • HTTPS://WWW.LINKEDIN.COM/IN/PRAMODSINGLA • HTTPS://PRAMODSINGLA.COM/
  2. 2. TOPICS Why Machine learning Life cycle of machine learning Challenges Why Azure Databricks ML Using Azure Databricks Set up Databricks Runtime ML MLFlow H2O Demo Available Machine learning options in Azure Databricks
  3. 3. WHAT AND WHY MACHINE LEARNING
  4. 4. DATA SCIENCE LIFE CYCLE
  5. 5. CHALLENGES Collect, Clean and Process Data 80% time spent Problem Formulation Choosing algorithms and parameters Speed (GPUs) myriad tools and Libraries Hard to track experiments Hard to reproduce results Hard to deploy ML
  6. 6. WHY AZURE DATABRICKS
  7. 7. AVAILABLE MACHINE LEARNING OPTIONS IN AZURE DATABRICKS •DATABRICKS RUNTIME ML •APACHE SPARK MLLIB •THIRD PARTY ML LIBRARY SUPPORT E.G. H2O •MLFLOW
  8. 8. DATABRICKS RUNTIME ML PRELOADE LIBARIRES (SAMPLE CLUSTER CONFIG ON NEXT SLIDE) Category Libraries Distributed Deep Learning Distributed training with Horovod and Spark: Distributed TensorFlow and Keras prediction: Deep Learning Keras: TensorFlow: GPU libraries: XGBoost XGBoost4j Other machine learning libraries numpy scikit-learn scipy
  9. 9. MLFLOW • AN OPEN SOURCE PLATFORM FOR MANAGING THE END-TO-END MACHINE LEARNING LIFECYCLE • ORGANIZED INTO THREE COMPONENTS: TRACKING, PROJECTS, AND MODELS. • TRACKING  API AND UI FOR LOGGING PARAMETERS, DATA, CODE AND RESULTS . TRACKING SERVER • PROJECTS  • CAPTURE WHOLE ENVIRONMENT INCLUDING DEPENDENT LIBRARIES ( PLATFORM AND DATA SCIENTIST INDEPENDENT). • SIMPLY A DIRECTORY WITH CODE OR A GIT REPOSITORY, AND USES A DESCRIPTOR FILE OR SIMPLY CONVENTION TO SPECIFY ITS DEPENDENCIES AND HOW TO RUN THE CODE. • MODELS MANAGING AND DEPLOYING MODELS FROM A VARIETY OF ML LIBRARIES TO A VARIETY OF MODEL SERVING AND INFERENCE PLATFORMS
  10. 10. H2O SPARKLING WATER • BEST ML LIBRARIES FOR SPARK • LEADERS IN MAGIC QUADRANT • SUPPORT BOTH SUPERVISED AND UNSUPERVISED MODELS • E.G. RANDOM FOREST, GLM, GBM, XGBOOST, GLRM, WORD2VEC AND MANY MORE
  11. 11. MLFLOW AND H2O

×