SlideShare a Scribd company logo
1 of 24
Download to read offline
End-to-End Machine Learning
Pipeline with Docker Enterprise and
Kubeflow
Categorizing Docker
Hub Public Images
Roberto Hashioka / @rhashioka
Software Engineer, Docker
Amn Rahman / @amnrahman
Data Engineer, Docker
●Problem Statement

●Architecture and Workflow

●Docker Enterprise features 

●Demo

●Challenges and Next Steps
Agenda
● Current state:

○ More than 4.6M unlabeled public repositories on Hub 

○ Inability to search Hub images based on categories

○ Limited user experience due to lack of context (e.g. backend-
frameworks->databases) 

● Proposed Solution:

○ Automate the Docker Hub image categorization (send
description -> return suggested categories/topics)

Use Case - Docker Hub
● How to reliably deploy models created by data scientists to
production? 

● How to facilitate the creation and maintenance of end-to-end ML
pipelines?

● How to automate the ML pipeline workflows?

Problem Definition
Demonstrate the use of Kubeflow on Docker
Enterprise to train and deploy an ML model.
Project / Solution Scope
Personas
DEVOPS
ENGINEERS
• Provide and maintain
infrastructure services for
other teams

• Monitoring & Configuration
management

• Deployment Automation &
Security
DATA
ENGINEERS
• Develop, test and maintain
data pipelines

• Improve data reliability,
efficiency and quality

• Discover opportunities for
data acquisition

DATA
SCIENTISTS
• Answer industry and business
questions

• Prepare data for use in
predictive modeling

• Building ML models to
improve existing product
workflows and UX

DATAOPS ENGINEERS
The Kubeflow project is dedicated to making deployments of
machine learning (ML) workflows on Kubernetes simple, portable
and scalable. 

Source: github.com/kubeflow/kubeflow
What is Kubeflow?
Architecture
Kubeflow Prometheus
TFJob
Grafana Jupyter Hub
Tensorflow
Hub
Kubernetes +
Docker Engine
ENTERPRISE PLATFORM
Cloud VM
Bare
Metal
Docker for
Desktops
Notebook 1
Notebook 2
….
Model 1
Model 2
….
Seldon
ArgoAmbassadorKatib
Seldon-Core Architecture
Kubernetes API
Docker Trusted Registry
ENTERPRISE PLATFORM
Ambassador
(API Gateway)
Argo Job (CI/CD)
Data scientists
and engineers Operator
Service
Orchestrator
1.N deployment
graphs
Service
Orchestrator
Service
Orchestrator
Model
1..N
REST or gRPC
Business
Applications
Production Environments
Docker Trusted Registry
Docker EE
Production Environments
Version Control
Non-Production EnvironmentsDeveloper Machine
Development CI/CD Customers
Datacenter 1
Datacenter 2
Docker for
Development Process
Docker EE
Docker Content Trust
$ docker trust sign dev/dockerhubclassifier:v3
Signing and pushing trust metadata for dev/dockerhubclassifier:v3

The push refers to a repository [docker.io/dev/dockerhubclassifier:v3]
...

v1: digest: sha256:74d4bfa917d55d53c7df3d2ab20a8d926874d61c3da5ef6de15dd2654fc467c4 size: 1357

Signing and pushing trust metadata

Enter passphrase for delegation key with ID 27d42a8:

Successfully signed dev/dockerhubclassifier:v3
Image Scanning
Image Promotion
Image Promotion
Data Collection Exploratory Data Analysis
Data Cleaning and
Transformation
Model Building
Deploying Model to
Production
21 3
4 5
Collected data from other sources on
the internet with tagged content and
joined with Hub data
Pandas data frames, distribution of
categories, observing correlations etc.
using several Python packages 

Splitting, training, predicting and tuning.
Reducing categories, stop word
removal, encoding labels, vectorization
and general transformations to feed into
classifiers.

Serving, monitoring and logging at scale
Model Selection /
Walkthrough of Jupyter
Notebook
DEMO
Kubeflow: Composability, Portability and Scalability
Kubeflow: Composability, Portability and Scalability
Argo Workflow on Docker for Desktop
and Docker Enterprise

Model performance Monitoring
DEMO
Repo owner Mike
Mike decides to be a good person
and labels his repo
With ML, Mike can see the most
relevant categories according the
repo’s content
and proceeds to choose the
appropriate labels
- Simpler workflow with Kubeflow Pipelines 

- Stepping stone for future ML projects 

- Canary deployments and A/B Testing with Seldon +
Istio
Challenges / Next Steps
● End-to-End Machine Learning Pipeline with Docker for Desktop and Kubeflow
○ https://github.com/dockersamples/docker-hub-ml-project
Contributions to the Community
Docker for Desktop Kubeflow
- https://github.com/kubeflow/kubeflow

- https://www.docker.com/products/docker-desktop

- https://github.com/dockersamples/docker-hub-ml-project
Resources
THANK YOU!

More Related Content

What's hot

What's hot (20)

Architecting for Continuous Delivery
Architecting for Continuous DeliveryArchitecting for Continuous Delivery
Architecting for Continuous Delivery
 
17 Things Developers Should Know About Databases
17 Things Developers Should Know About Databases17 Things Developers Should Know About Databases
17 Things Developers Should Know About Databases
 
What's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | EdurekaWhat's new in python 3.8? | Python 3.8 New Features | Edureka
What's new in python 3.8? | Python 3.8 New Features | Edureka
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Red Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized StorageRed Hat OpenShift on Bare Metal and Containerized Storage
Red Hat OpenShift on Bare Metal and Containerized Storage
 
Kubernetes for java developers
Kubernetes for java developersKubernetes for java developers
Kubernetes for java developers
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
 
FluentD vs. Logstash
FluentD vs. LogstashFluentD vs. Logstash
FluentD vs. Logstash
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23CNCF Live Webinar: Kubernetes 1.23
CNCF Live Webinar: Kubernetes 1.23
 
Using Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous IntegrationUsing Containers to More Effectively Manage DevOps Continuous Integration
Using Containers to More Effectively Manage DevOps Continuous Integration
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017OpenShift Overview - Red Hat Open House 2017
OpenShift Overview - Red Hat Open House 2017
 
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
OpenShift Meetup 8th july 2019 at ConSol - OpenShift v4
 

Similar to Categorizing Docker Hub Public Images

Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
Chris Ciborowski
 

Similar to Categorizing Docker Hub Public Images (20)

Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
 
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
 
[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin
 
HPC Cloud Burst Using Docker
HPC Cloud Burst Using DockerHPC Cloud Burst Using Docker
HPC Cloud Burst Using Docker
 
Docker Roadshow 2016
Docker Roadshow 2016Docker Roadshow 2016
Docker Roadshow 2016
 
Container on azure
Container on azureContainer on azure
Container on azure
 
Containers Demystified
Containers DemystifiedContainers Demystified
Containers Demystified
 
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday Docker
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 
Azure ai on premises with docker
Azure ai on premises with  dockerAzure ai on premises with  docker
Azure ai on premises with docker
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Axigen on docker
Axigen on dockerAxigen on docker
Axigen on docker
 
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 
How (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaSHow (and why) to roll your own Docker SaaS
How (and why) to roll your own Docker SaaS
 
Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Categorizing Docker Hub Public Images

  • 1. End-to-End Machine Learning Pipeline with Docker Enterprise and Kubeflow Categorizing Docker Hub Public Images
  • 2. Roberto Hashioka / @rhashioka Software Engineer, Docker Amn Rahman / @amnrahman Data Engineer, Docker
  • 3. ●Problem Statement ●Architecture and Workflow ●Docker Enterprise features ●Demo ●Challenges and Next Steps Agenda
  • 4. ● Current state: ○ More than 4.6M unlabeled public repositories on Hub ○ Inability to search Hub images based on categories ○ Limited user experience due to lack of context (e.g. backend- frameworks->databases) ● Proposed Solution: ○ Automate the Docker Hub image categorization (send description -> return suggested categories/topics) Use Case - Docker Hub
  • 5. ● How to reliably deploy models created by data scientists to production? ● How to facilitate the creation and maintenance of end-to-end ML pipelines? ● How to automate the ML pipeline workflows? Problem Definition
  • 6. Demonstrate the use of Kubeflow on Docker Enterprise to train and deploy an ML model. Project / Solution Scope
  • 7. Personas DEVOPS ENGINEERS • Provide and maintain infrastructure services for other teams • Monitoring & Configuration management • Deployment Automation & Security DATA ENGINEERS • Develop, test and maintain data pipelines • Improve data reliability, efficiency and quality • Discover opportunities for data acquisition DATA SCIENTISTS • Answer industry and business questions • Prepare data for use in predictive modeling • Building ML models to improve existing product workflows and UX DATAOPS ENGINEERS
  • 8. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Source: github.com/kubeflow/kubeflow What is Kubeflow?
  • 9. Architecture Kubeflow Prometheus TFJob Grafana Jupyter Hub Tensorflow Hub Kubernetes + Docker Engine ENTERPRISE PLATFORM Cloud VM Bare Metal Docker for Desktops Notebook 1 Notebook 2 …. Model 1 Model 2 …. Seldon ArgoAmbassadorKatib
  • 10. Seldon-Core Architecture Kubernetes API Docker Trusted Registry ENTERPRISE PLATFORM Ambassador (API Gateway) Argo Job (CI/CD) Data scientists and engineers Operator Service Orchestrator 1.N deployment graphs Service Orchestrator Service Orchestrator Model 1..N REST or gRPC Business Applications
  • 11. Production Environments Docker Trusted Registry Docker EE Production Environments Version Control Non-Production EnvironmentsDeveloper Machine Development CI/CD Customers Datacenter 1 Datacenter 2 Docker for Development Process Docker EE
  • 12. Docker Content Trust $ docker trust sign dev/dockerhubclassifier:v3 Signing and pushing trust metadata for dev/dockerhubclassifier:v3
 The push refers to a repository [docker.io/dev/dockerhubclassifier:v3] ...
 v1: digest: sha256:74d4bfa917d55d53c7df3d2ab20a8d926874d61c3da5ef6de15dd2654fc467c4 size: 1357
 Signing and pushing trust metadata
 Enter passphrase for delegation key with ID 27d42a8:
 Successfully signed dev/dockerhubclassifier:v3
  • 16. Data Collection Exploratory Data Analysis Data Cleaning and Transformation Model Building Deploying Model to Production 21 3 4 5 Collected data from other sources on the internet with tagged content and joined with Hub data Pandas data frames, distribution of categories, observing correlations etc. using several Python packages Splitting, training, predicting and tuning. Reducing categories, stop word removal, encoding labels, vectorization and general transformations to feed into classifiers. Serving, monitoring and logging at scale
  • 17. Model Selection / Walkthrough of Jupyter Notebook DEMO
  • 20. Argo Workflow on Docker for Desktop and Docker Enterprise Model performance Monitoring DEMO
  • 21. Repo owner Mike Mike decides to be a good person and labels his repo With ML, Mike can see the most relevant categories according the repo’s content and proceeds to choose the appropriate labels
  • 22. - Simpler workflow with Kubeflow Pipelines - Stepping stone for future ML projects - Canary deployments and A/B Testing with Seldon + Istio Challenges / Next Steps
  • 23. ● End-to-End Machine Learning Pipeline with Docker for Desktop and Kubeflow ○ https://github.com/dockersamples/docker-hub-ml-project Contributions to the Community Docker for Desktop Kubeflow
  • 24. - https://github.com/kubeflow/kubeflow - https://www.docker.com/products/docker-desktop - https://github.com/dockersamples/docker-hub-ml-project Resources THANK YOU!