Scaling AI and machine learning projects poses challenges around collaboration, data access, and deploying models into production. Containers and Kubernetes can help address these challenges by providing a self-service platform for data scientists to access tools, frameworks, and compute resources. This allows for rapid iteration and sharing of work. Kubernetes provides resource management and workload scheduling across hybrid cloud environments. OpenShift is a distribution of Kubernetes optimized for AI/ML workloads. It incorporates additional services for continuous integration/delivery and automation. Open Data Hub is an open source community project and reference architecture for building AI platforms on OpenShift and Kubernetes.
Aspirational Block Program Block Syaldey District - Almora
Scaling AI/ML with Containers and Kubernetes
1. Scaling AI and Machine Learning with
Containers and Kubernetes
Global Big Data Conference
Boston, Oct 1-3, 2019
Tushar Katarki
OpenShift Product Manager - Lead for AI/ML
Red Hat
2. Outline
● Scaling challenges in AI/ML
● Addressing the challenges with containers, kubernetes and more
● Open Data Hub - A community open source project
● Putting it together:
○ Self-service cloud like experience
○ From Experimentation to production continuously with CI/CD
● Summary
● Resources
3. Scaling
Challenges
Unable to easily share and collaborate,
iteratively and rapidly
Access to data is bespoke, manual and time
consuming
No on-demand access to ML tools and
frameworks and compute infrastructure
Models are remaining prototypes and not going
into production
Reproducing, tracking and explaining results of
AI/ML is hard
IMPACT
Speed, efficiency and productivity of teams
Frustration and lack of satisfaction
The promise of AI/ML to the business is not
redeemed
5. Inferencing
Perform ML
Modelling
Self service portal to select
ML frameworks, data access
Deployment in
production
As a Data Scientist, I want a “self-service
cloud like” experience for my Machine
Learning projects, where I can access a
rich set of modelling frameworks, data,
and computational resources, share and
collaborate with colleagues, and deliver
my work into production with speed,
agility and repeatability to drive
business value!
7. Look no further .. we have done this with application
software development and delivery …
Lorem ipsum
congue tempus
Cloud
Microservices
Containers
CI/CD
Agile
How do we bring this to the world of AI?
Source:
http://www1.semi.org/en/semi-arizona-forum-artificial-intellig
ence-machine-learning-deep-learning-applications-0
Kubernetes
8. Containers
Are the basic units that make
AI/ML programs shareable and
portable across hybrid cloud
Choice: Containers contain all your ML
frameworks and tools
Sharing: Container images can be shared
and iterated in flexible ways
Immutable & Portable: Contain once and run
them anywhere with integrity
Versioning: Incremental changes are tracked
Fast & Efficient: They are Linux processes!
Security: Process isolation and resource
control
9. Kubernetes
Kubernetes centralizes compute resources
and provides a cloud experience across the
data center, cloud and edge
Provides resource management for compute
resources
Kubernetes provides workload scheduling
and management
Kubernetes provides multi tenancy and
enforces quotas
Networking and storage abstractions
Kubernetes is the de facto container
platform for the hybrid cloud
Foundation of the AI platform for
Hybrid Cloud
10. Self-service,
Automation,
CI/CD
Boosts speed, efficiency and
productivity
JupyterHub and Jupyter Notebooks running on
Kubernetes form the basis for Self-service
Source-2-image automatically converts a
notebook into a container image that is ready to be
deployed
Kubernetes Operators provide automation and
lifecycle management for the containers
CI/CD makes rapid, incremental and iterative
change possible; Open source technologies such
as Argo, Tekton, Jenkins and Spinnaker in
conjunction with Kubernetes make this happen
‘Serverless’ technologies such as Knative will
enable AI/ML users to spend more time developing
their models
11. Data
Engineering
Easy, self-service and
repeatable
Data sources: Kubernetes Persistent Volumes and
S3 object store makes access to storage easy and
standardized
Data pipes: Kubernetes Networking and
ServiceMesh provides the data connectivity - high
bandwidth, low latency that is secure
Data streaming and manipulation: Tools such as
Spark, Kafka, Presto etc can run natively and can be
accessed as a service
Data governance: With open source technologies
like Open Policy Agent (OPA)
12. Deploying into
production
To deliver business value and
redeem the promise of AI in the
enterprise
Containerize models and expose the service
with an REST API using the microservices
pattern - ServiceMesh (such as ISTIO) makes
this easy !
Models are incorporated in a data pipeline
Jobs (batch or real-time) with tools such as
Spark, Kafka and Argo
Models are delivered into existing application
workflow as binaries: PMML, ONNX, Pickle
Monitoring model performance and drift with
open source tools native to Kubernetes:
Prometheus and Grafana
CI/CD to drive continuous change and
improvement in production
13. OpenShift - Enterprise Distro of Kubernetes
ANY
CONTAINER
Amazon Web Services Microsoft Azure Google CloudOpenStackDatacenterLaptop
ANY
INFRASTRUCTURE
APPLICATION LIFECYCLE MANAGEMENT
ENTERPRISE CONTAINER HOST
CONTAINER ORCHESTRATION AND MANAGEMENT
(KUBERNETES)
14. OpenShift Abstraction Layers
Automated
Operations
with Operators
Kubernetes
Red Hat Enterprise Linux or Red Hat CoreOS
CaaS PaaSBest IT Ops Experience Best Developer Experience
Application
Services
Middleware, Service Mesh, Functions, ISV
Cluster
Services
Metrics, Chargeback, Registry, Logging
Developer
Services
Dev Tools, Automated Builds, CI/CD, IDE
15. OpenShift Architecture for AI/ML
EXISTING
AUTOMATION
TOOLSETS
SCM
(GIT)
CI/CD
SERVICE LAYER
ROUTING LAYER
PERSISTENT
STORAGE
REGISTRY
RHEL
NODE
c
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
C
C
C C
C
C
C CC C
RED HAT
ENTERPRISE LINUX
MASTER
API/AUTHENTICATION
DATA STORE
SCHEDULER
HEALTH/SCALING
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
DATA SCIENTIST
Deploy ML on any
cloud
Expose ML as
services, load
balanced and
scalable
Compute
Resources
on-demand
Best of SDLC
ML in
Production
16. Open Data Hub Community Project
● Meta-Project that includes best of open source AI projects
● Derives from Red Hat’s internal Data Science and AI platform
● Serves as Reference Architecture for AI on OpenShift
● Growing ecosystem of data science tools and ISVs
Data
Acquisition & Preparation
ML Model
Selection, Training, Testing
ML Model Deployment in
App. Dev. Process
17. Open Data Hub v0.4
Now available on opendatahub.io
● Unified analytics
engine
● Large-scale data
● Runs on
Kubernetes
● Multi-user Jupyter
● Used for data science
and research
● Monitoring and alerting toolkit
● Records numeric time series
data
● Used to diagnose problems
● Analytics platform for
all metrics
● Query, visualize and
alert on metrics
● Deploying machine
learning models on
Kubernetes
● Expose models via REST
and gRPC
● Full model lifecycle
management
● Distributed Object Store
● S3 Interface
● Distributed event streaming
● Pub/Sub Messaging
Operator
Open Data Hub
18. Open Data Hub Operator
Operator
Open Data Hub
Deploy and manage
lifecycle
20. A self-service cloud like experience
Model
test &
iteration
Jupyter Hub
Model deployed
into production
ACCESS TO
DATA
CPUs, GPUs, Memory, NVMe
DATA SCIENTIST
SELF
SERVICE
Compute
Resources
21. From experimentation to production with CI/CD
Container
DATA SCIENTIST
Source-2-imageCheck-in to
source repo
Deloy
notebook
container
Model test &
iteration and
integration
Promote and
Serve models
into production
as services
Continuous monitoring
and change management
22. Summary
Containers and Kubernetes are foundational
to scaling AI
Also need to think about: Managing data
pipelines, automation and CI/CD, deploying
models into production
OpenShift - Enterprise Kubernetes Distro that
builds on Red Hat Enterprise Linux and
additional services for CI/CD and automation
on top
Open Data Hub - open source community
project and reference architecture for AI/ML
Scaling AI
23. Resources
OpenShift developer preview: try.openshift.com
OpenDataHub: https://opendatahub.io/
Contacts:
Tushar Katarki: tkatarki@redhat.com
Linkedin: https://www.linkedin.com/in/katarki/
Upcoming:
OpenShift Commons Gathering on AI/ML in San Francisco
Kubecon Nov 20th 2019 in San Diego - Customer case study for scaling AI/ML with Kubernetes