Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Infrastructure Agnostic Machine Learning Workload Deployment(20)

Anúncio

Mais de Databricks(20)

Último(20)

Anúncio

Infrastructure Agnostic Machine Learning Workload Deployment

  1. Infrastructure Agnostic Machine Learning Workload Deployment Abi Akogun Data Science Consultant (MavenCode) Charles Adetiloye ML Platforms Engineer (MavenCode)
  2. About MavenCode MavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do training, product development, and consulting services in the following areas: ● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure ● Development & Deployment of Machine Learning and Artificial Intelligence Platforms ● Streaming and Big Data Analytics Edge-IoT and Sensors
  3. About The Presenters Charles Adetiloye is an ML Platforms Engineer at MavenCode. He has well over 15 years of experience building large-scale, distributed applications. He has extensive experience working and consulting with several companies implementing production grade ML and AI platforms twitter.com/cadetiloye Abiodun Akogun is a Machine Learning and Data Science Consultant at Mavencode. He has extensive experience building and deploying large-scale Machine Learning Applications in different industries that include Healthcare, Finance, Telecommunications, and Insurance. He has experience solving several business problems using Data Analytics, Sentiment Analysis, Topic Modelling, Named Entity Recognition(N.E.R), Opinion Mining, Data Mining, Time Series, Spatial Statistics and Marketing Analytics twitter.com/akogz
  4. Agenda ▪ Overview of Machine Learning Model Deployment Workflow ▪ Various Approaches to model training, management, and serving in the Cloud ▪ Deploying Machine Learning Workloads in the Cloud ▪ Implementing Feature Storage backend for ML model training ▪ Running Spark Workloads for ML training on Kubernetes with Kubeflow
  5. Overview of Machine Learning Deployment Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing
  6. Machine Learning Workload Deployment Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Google Cloud AWS Azure On Prem
  7. Machine Learning Deployment Effort Data Verification Configuration Feature Extraction Data Validation Machine Resource Management Serving Infrastructure Monitoring Analysis Tool Machine Learning Code Data Preparation + Storage Efficient Compute Resource Management
  8. Overview of Machine Learning Deployment Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing 32% 10% 36% 2% 4% 16%
  9. A Typical Machine Learning Developer Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Azure Storage Google Storage AWS S3 Storage Raw Data Transformation Processed Data Storage Compute 1 2 Google Cloud AI AWS Sage Maker Azure ML Data Scientist / ML Engineers works on pulling or processing data first before starting ML training on a Managed Cloud Service Raw Data Processing and Transformation Pipeline Cloud Training Platforms
  10. What Enterprise Machine Learning Workflow In the Cloud Looks Like! Data Sourcing Pre Processing Feature Engineering Azure Storage Google Storage AWS S3 Storage Raw Data Transformation Processed Data Storage Compute 1 2 Team A Team B Team C Team D Google Cloud AI AWS SageMaker AWS SageMaker Azure ML Running ML workflow across the enterprise with multiple teams using different Cloud Provider technology stacks
  11. Implementing Machine Learning solutions in the cloud comes at a cost, with cost of Compute and Storage on top of the list.
  12. If we plan to be Cloud Neutral, can we abstract our ● Machine Learning Compute Workload→Kubernetes? ● Machine Storage → Feature Store?
  13. Google Cloud AI AWS Sage Maker Azure ML A Typical Machine Learning Developer Workflow Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Azure Storage Google Storage AWS S3 Storage Data Source Transformation Processed Data Storage Compute 1 2
  14. Towards A Cloud Neutral ML Deployment Environment Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes
  15. Why the need for Cloud Agnostic Deployment Infrastructure?
  16. ● Makes it easier to migrate workloads in a Hybrid Cloud Environment ● We are not tied to particular Cloud Infrastructure technology stack ● It’s easier to Implement best practice patterns and solutions ● Your team will have a common base denominator for all Enterprise ML workload ● Easy to control cost, manage utilization and forecast demand
  17. Cloud Agnostic Machine Learning Development Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes Azure Storage Google Storage AWS S3 Storage
  18. What’s Feature Store All about? A Feature is a measurable observable attribute that is part of the input to a Machine Learning Model. Model Training X1 X2 X3 Xn [Feature Vector] Model
  19. What’s Feature Store All about? Model Training X1 X2 X3 Xn [Feature Vector] Model Model 1 Features are derived from ● Raw Datastore ● Streaming Datasource ● Aggregates of Raw Inputs ● Windows (mins, hourly, daily, weekly)
  20. Features Change Over time! Model Training X1 X2 X3 Xn X1 X2 X3 Xn X1 X2 X3 Xn Time
  21. Machine Learning Feature Store ● Makes it easy to operationalize our ML workload, most importantly Data Management and Storage for Model training ● Features can be shared easily amon teams running different Model training pipelines ● We can get to version of datasets and track changes easily ● Consistency in Feature input attributes between Model Training and Serving
  22. ● Offline Feature Store → Batching Training ● Online Feature Store → Inferencing / Serving Types Of Feature Store
  23. Implementing Offline Feature Storage with Apache Hudi Azure Storage Google Storage AWS S3 Storage Streaming Source Batch Job Operations Datasource with Streaming sources like MQTT, Kafka, Pubsub etc Batch Operations on Databases, FileStorage, Distributed Storage etc Feature Store Workflow Scheduling Orchestration with Kubeflow Pipelines or Airflow Dags on Kubernetes Feature Store Implementation on any of the Major Cloud Storage
  24. ● A need for a Unified Platform where new data can be made available in addition to historical data within minutes. ● The need for a quick computation (or derivation ) of Feature vectors in other to make them available for our model input. ● Incremental Versioning of our Feature collections so that we can time-travel and use a particular set of features for Model training. ● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer. ● Easy to implement all our code and everything we need to do with Spark and PySpark Why did we use Apache Hudi?
  25. Getting Data into Hudi Feature Store with Kubeflow Pipeline import kfp from kfp import components KafkaDatastreamer_op = kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”) ValidatorOnSchema_op = kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1") PreProcessor_op = kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1") HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter, base_image="mavencode.io/spark:v3.1.1")
  26. The Hudi Data Store writer Configure the Spark Session with the packages needed to run hudi and avro Hudi configuration Options Writing the data into our Hudi data store in the right format
  27. Cloud Agnostic Machine Learning Development Data Sourcing Pre Processing Feature Engineering Model Training / Evaluation Model Scoring /Management Model Inferencing Storage Compute 1 2 Feature Store Kubernetes Cloud Native ML Workload Deployment with Operators on Kubeflow Cloud Native ML Training Deployment ● Containerized Workload ● Scalable + Can Run in Distributed Mode ● Efficient Compute Utilization ● Language Agnostic!
  28. Machine Learning Operators with Kubeflow on Kubernetes ● An Machine Learning Operator helps the deployment monitoring and management a model training life- cycle ● Some ML Operators found in Kubeflow are: ○ TF-operator → Tensorflow Job ○ Pytorch-operator → Pytorch Job ○ Xgboost-operator → Xgboost Job ○ Spark-operator → Spark and Spark ML Jobs
  29. Cloud Agnostic Machine Learning Development MLOps Model Training and Deployment Platform Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Namespace Namespace Namespace Namespace Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool Spark Operator Spark Operator TensorFlow Operator Tensorflow Operator Cloud Infrastructure Layer Running Auto Scaling Node Pools Running Kubernetes Machine Learning Operators running with Kubeflow Feature Store
  30. Using Spark Operator for Training ML Steps PySpark ML Code Containerize the Python Code Create SparkApplication Kubernetes YAML Deployment Apply Deployment to Kubernetes
  31. Spark Operator on Kubernetes API Scheduler OR OR OR Spark Driver Executors
  32. Elastic Compute Resource ML Jobs API Scheduler OR OR OR kubectl apply -f ...
  33. Deployment Configuration YAML Spark Application Config that describes the job and the namespace where the job will run Container that will run our Spark ML Code Spark Drive and Executor Configuration
  34. Connecting to Feature Store with Kubeflow Pipeline
  35. Cost comparison with Managed Cloud service on AWS 30% 100% 15s 66s Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity 6x Productivity Managed Services Running on AWS Kubeflow + S3 Feast Storage ML workload
  36. Summary ● Implementing a Cloud neutral ML deployment approach simplifies most of the complexities in a Multi-Cloud environment ● After the initial hump, learning curve and the overall team efficiency improves significantly ● Teams is not locked in to a particular Cloud Infrastructure stack ● Easy to control cost and forecast future capacity demands
  37. THANK YOU!
  38. Thank You! If you are interested in learning more about how to run your Machine Learning Workloads on any Cloud Infrastructure or Onprem reach out to us Drop us a mail hello@mavencode.com Visit Us Online https://www.mavencode.com Follow Us https://www.twitter.com/mavencode
Anúncio