Kubernetes data science and machine learning

Kubernetes, Data Science and Machine Learning

Click to add text
Click to add text
Learn more at kublr.com/how-it-works/kublr-platform

Common ML Challenges and Approaches
Common ML challenges:
• Computer vision, natural language processing, speech
recognition, predictions, anomaly detection,
ML approaches:
• Supervised learning
• Classification, Regression
• Unsupervised learning
• Clustering

Typical ML Challenges
1. Data Source
2. Data preparation
3. Modelling
4. Model serving
5. Analysis
DB / File storage
Data Cleansing
Batches /Streaming
Data Transformation
Model Training
A/B Testing
Optimization
Inferencing
Results Exploration Interpretations

Why Using Kubernetes for ML?
Architecture – separation of concerns (dev, ops, infra), useful abstractions; universality
Pluggable and extensible – k8s is a set of open source microservices
Scalability and HA – autoscaling, resource management, self-healing
Container based – isolation, lightweight, few (if any) limitations on applications
Cloud and OS agnostic – Kubernetes + containers
Shared compute – RBAC, Limits, Quotas
On-demand – cloud support, autoscaling, reproducible applications
Frameworks – Great community

Kubernetes and Kublr for ML
• Infrastructure abstraction and scheduling
• DevOps and operational layer: monitoring and logging,
observability, HA
• Auto-scaling: HPA and cluster auto-scaler
• Kubernetes operators
• Storage (HDFS, Rook/Ceph)
• Custom resources and GPU

Kubernetes as an Orchestration Platform
Kubernetes
• Infrastructure abstraction
• Orchestration
• Network
• Configuration
• Service discovery
• Ingress
• Persistence
Master Node
K8s master components:
etcd, scheduler, api,
controller
K8s
metadata
Docker
kubelet
App data
K8s node components:
overlay network,
discovery, connectivity
Infrastructure and
application containers
Infrastructure and
application containers
Overlay
network

Kublr as Operations and DevOps Layer
K8S Clusters
PoC
Dev
Prod
Cloud
Data
center
API UI
Log collection
Operations
Monitoring
Authn and authz, SSO, fed
Audit Image Repo
Infrastructure management
Backup & DR
• Security
• Multiple environments
• Hybrid support
• Infrastructure
• Operations
• Monitoring and logs
• Backup and DR
• Container image
management

Horizontal Pod Autoscaler
• Cooldown/delay
• Rolling update
• Multiple metrics
• Custom metrics
Kubernetes
Deployment
HPA
Pod NPod 1
scale
...
metrics

Cluster Autoscaler
Kubernetes
Node group 1
Cluster
Autoscaler
Node NNode 1
scale
...
Resources
1. Requested by pods
2. Provided by nodes
• Multiple node groups
• AWS, Azure, GCE
• Cool-down period
• Scheduling rules
compliance
Node group M
Node NNode 1 ...
...
Master

Operators Kubernetes
K8S objects:
• Deployments
• Pods
• Namespaces
• Persistent Volumes
• Custom Resources
• ...
Operator
• Arbitrary software
• Operations automation
• Management automation
• Cloud native adaptation
• Custom Resources
• Annotations
• ...

Storage: HDFS and Hadoop
Hadoop/HDFS
• Scheduling tasks close to the data
• Reliable storage
• Established tool stack for data science and ML
Kubernetes
• Infrastructure management and recovery
• Underlying storage management
• Portability, hybrid support

Storage| Rook and Ceph
Rook = Ceph operator
Cloud native Ceph
Custom resources:
• Cluster
• Replica pool
• File system
Supported storage types:
• Block (rdb)
• Filesystem
• Object (S3, OpenStack
API)
[1] https://www.youtube.com/watch?v=iwVAvV_lI_Q

Kubernetes, GPU, and Kublr
• Standard Kubernetes resources: CPU, RAM, storage
• Custom resources – GPU, FPGA – via device plugins
• Nvidia GPU require driver and custom container runtime
• Kublr automates
• GPU driver installation
• Nvidia container runtime setup
• Nvidia device plugin setup

Major ML Stacks Compatible w/ Kubernetes
• Kubernetes TensorFlow TF-operator: github.com/kubeflow/kubeflow
• Spark 2.3.0
• In-house solution (model in Docker containers, run them on cloud or on-
prem Kubernetes )
• beam.apache.org
• Other rather “new” open source solutions
• Cloud and other vendor solutions

Kubeflow
• Simplify scaling and deploy machine learning applications
• Work on including different tooling
• Train/serve TensorFlow models in different environments
• Use Jupyter notebooks to manage TensorFlow training jobs

Spark without “Native " Kubernetes Support
A Spark standalone cluster in Kubernetes

Spark 2.3.0 bin/spark-submit
--master k8s://https://<k8s-apiserver-
host>:<k8s-apiserver-port>
--deploy-mode cluster
--name spark-pi --class
org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf
spark.kubernetes.container.image=<spark-
image>
local:///path/to/examples.jar

In-House Solution
• Custom Docker image with ML logic
• Kubernetes as scheduler
• Monitoring tools
• Custom implementation of distributed tasks scheduler or
framework (e.g. Celery)

Demo | ML and HPA with Custom Metrics

Demo
ML and HPA with Custom Metrics
To view the demo, check out our webinar on:
https://goo.gl/vY6HbE

Vlad Penkin
Oleg Chunikhin
Arkadii Ocheretnoi
Thank you!

Kubernetes data science and machine learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Kubernetes data science and machine learning

Semelhante a Kubernetes data science and machine learning (20)

Mais de Kublr

Mais de Kublr (15)

Último

Último (20)

Kubernetes data science and machine learning

Notas do Editor