SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
FullStack Developers Israel
CONTINUOS OPERATIONS
DEEP LEARNING | HAGGAI PHILIP ZAGURY
Tikal Knowledge
TIKAL INTRO
WHO WE ARE ?
▸ Tikal helps ISV’s in Israel & abroad in their technological
challenges.
▸ Our Engineers are Fullstack Developers with expertise in
Android, DevOps, Java, JS, Python, ML
▸ We are passionate about technology and specialise in
OpenSource technologies.
▸ Our Tech and Group leaders help establish & enhance
existing software teams with innovative & creative
thinking.
https://www.meetup.com/full-stack-developer-il/
FullStack Developers Israel
SELF INTRODUCTION
▸ My open thinking and open techniques
ideology is driven by Open Source
technologies and the collaborative manner
defining my M.O.
▸ My solution driven approach is strongly
based on hands-on and deep understanding
of Operating Systems, Applications stacks
and Software languages, Networking, Cloud
in general and today more an more Cloud
Native solutions.
▸ Technologies:
▸ Linux { just pick a flavour …}
▸ *Scripting
▸ Git
▸ Python/Go
▸ Cloud { public/private/hybrid }
▸ Docker
▸ Kubernetes

HAGGAI PHILIP ZAGURY - DEVOPS ARCHITECT AND GROUP TECH LEAD
FullStack Developers Israel
THE STORY …
MACHINE LEARNING | CONTINUOUS OPERATIONS
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
WE NEED “CI/CD” FOR OUR MODEL TRAINING …
▸ What he didn’t say is …
▸ In-browser training
▸ Backed training
▸ Tensorflow training
▸ Tensorflow serving
▸ Storage [ for raw data & model ] …
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
THE LEARNING CURVE
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
A RELATIVELY SIMPLE USE CASE …
TENSOR-FLOW
TRAINING
Server
SERVER
CLIENT
- SERVE FRONTEND APP
- COLLECT IMAGES
- TRAIN
-INFER
Upload Images
Serve
Model
Get trained
Model
Enrich
Model
with new data
Upload
Images
Serve
Protobuf
Object store
1
2
3
4
5
6
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
A CLASSIC APP
SERVER
CLIENT
- SERVE FRONTEND APP
- COLLECT IMAGES
- TRAIN
-INFER
Upload Images
Serve
Model
Get trained
Model
Upload
Images
Object store
1
2 5
6
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
MODEL TRAINING …
‣ If your using a pre-trained model - it’s no different
than using a backend / an api endpoint !
‣ Training processes are complex and require
Infrastructure As A Service & On demand
‣ Scalability
‣ faster Time to Market vs. faster results
‣ Scaling costs …
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
STAGE #1
‣ python train_model.py

3 Total data size: 332
4 Train X: (298, 7, 7, 256)
5 Train Y: (298, 2)
6 Test X: (34, 7, 7, 256)
7 Test Y: (34, 2)
8 Train on 298 samples, validate on 34 samples
9 Epoch 1/10
10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc:
0.9118
11 Epoch 2/10
12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc:
1.0000
13 Epoch 3/10
14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc:
1.0000
15 Epoch 4/10
16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc:
1.0000
17 Epoch 5/10
18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc:
1.0000
19 Epoch 6/10
20 298/298 [==============================] - 0s 1ms/step - loss: 0.0065 - acc: 1.0000 - val_loss: 0.0080 - val_acc:
TENSOR-FLOW
TRAINING
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
STAGE #2 - DOCKERIZE & PARAMETARIZE …
‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller-
model:latest
TENSOR-FLOW
TRAINING
3 Total data size: 332
4 Train X: (298, 7, 7, 256)
5 Train Y: (298, 2)
6 Test X: (34, 7, 7, 256)
7 Test Y: (34, 2)
8 Train on 298 samples, validate on 34 samples
9 Epoch 1/10
10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc:
0.9118
11 Epoch 2/10
12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc:
1.0000
13 Epoch 3/10
14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc:
1.0000
15 Epoch 4/10
16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc:
1.0000
17 Epoch 5/10
18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc:
1.0000
19 Epoch 6/10
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
CONTINUOS INTEGRATION
‣ A Jenkins pipeline
‣ Build - get sample data /
updated data
‣ Deploy model to cpu/gpu
‣ Train and record results
‣ Promote upload new
model for “space invaders”
micro service backend
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
THE GAME IS JUST A MEANS TO AN END …
TENSOR-FLOW TRAINING
TENSOR-FLOW
TRAINING
# epochs lr more flags
1 flags = tf.app.flags
2 flags.DEFINE_float("lr", 0.0001, "Learning Rate")
3 flags.DEFINE_string("units", "((50, 0.2), (40, 0.1))", "Configuration of hidden un
4 "Expected: tuple of tuple pairs. Each pair represent one hidde
5 "For instance: "((100, 0.2), (50, 0.3))" will create dense h
6 "dropout layer with rate of 0.2. Afterwards, it will create de
7 "dropout layer with rate of 0.3. If you wish to have hidden la
8 "second value. Example: "((100,), (50, 0.3))"")
9 flags.DEFINE_integer("epochs", 10, "Number of epochs")
10 flags.DEFINE_float("batch_frac", 0.3, "The fraction of training examples to consid
11 "For instance, 0.1 will divide the training to 10 batches")
12 flags.DEFINE_boolean("draw_plot", False, "Whether to draw a plot at the end")
13 flags.DEFINE_boolean("export_js", False, "Whether to export to a tenorflow.js mode
14 FLAGS = flags.FLAGS
TENSOR-FLOW TRAINING
# epochs lr more flags
‣ We need to train our
model

With different parameters
to

Reach the Optimal model
parameters …
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
SACALING / MULTIPLEXING … TENSORFLOW SUPPORTS MULTI-PART / DISTRIBUTED FLOWS
‣ Running the same model with
different parameters in order to
choose the most efficient vs most
accurate vs cost affective pipeline !
‣ most efficient #of epochs /
params
https://www.tensorflow.org/performance/datasets_performance
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
A/B TESTING / CANARY RELEASES ?!
MODEL VER 1.0
MODEL VER 1.7
MODEL VER 2.0
Storage Provider
60%
30%
10%
Collect In-Browser 

training
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
TRANSLATION …
▸ A flexible training model
▸ Parametarized flow
▸ Model Testing
▸ Promotion mechanism
▸ Data Import and preprocessing
▸ Post Processing
FullStack Developers IL
REQUIREMENTS DRIVEN
SOLUTION(S)
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - AWS ML
▸ Use custom DL AMI’s [ we used
them to get started … ]
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - AWS ML
▸ Use custom DL AMI’s [ we used
them to get started … ]
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - AWS ML
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - GCP ML/DL
▸ Assume you develop in the
cloud / on the cloud
▸ Consume C/G/Tpu’s
constantly
▸ Adjust your workflow to
Google Patterns (which isn’t
a bad thing …)
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - GCP ML/DL
▸ TPC lock-in ?
▸ Wouldn’t it be nice to
benchmark TPU & GPU on
another provider ?!
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OPTIONS - AZURE ML/DL
FullStack Developers Israel
IT’S ALL ABOUT THE PIPELINE / WORKFLOW
FullStack Developers Israel
TEXT
IT’S ALL ABOUT THE PIPELINE / WORKFLOW
‣ You might be able to make this work …
‣ But !
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
THERES A PATTERN HERE …
IDE
Model Serving
Model Storage
Parameter injectionParameterized training
Training Orchestrator
1
2
3
4
5
6
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
STAGE #3 - ADJUST OUR DOCKERIZED APP TO MY VENDOR …
‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller-
model:latest
TENSOR-FLOW
TRAINING
3 Total data size: 332
4 Train X: (298, 7, 7, 256)
5 Train Y: (298, 2)
6 Test X: (34, 7, 7, 256)
7 Test Y: (34, 2)
8 Train on 298 samples, validate on 34 samples
9 Epoch 1/10
10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc:
0.9118
11 Epoch 2/10
12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc:
1.0000
13 Epoch 3/10
14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc:
1.0000
15 Epoch 4/10
16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc:
1.0000
17 Epoch 5/10
18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc:
1.0000
19 Epoch 6/10
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
DO I CARE ABOUT VENDOR LOCK-IN ?! - LET’S TALK MULTI-CLOUD
my laptop 

cloud
I need CPU / GPU / TPU
Adjust / Wrap our code to
suit the Vendor
TENSOR-FLOW
TRAINING
TENSOR-FLOW
TRAINING
TENSOR-FLOW
TRAINING
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
IT’S NOT ONLY A MATTER OF VENDOR LOCK-IN! - IT’S MULTI-CLOUD
Only in Google ATM
CPU GPU TPU
my laptop 

cloud
I need CPU / GPU / TPU
FullStack Developers Israel
OPERATORS
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
TF [TENSORFLOW] OPERATOR
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
STAGE #4 - WRAP CODE TO SUPPORT WORKER | ADMIN | PS OPERATOR PATTERN
‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller-
model:latest
TENSOR-FLOW
TRAINING
3 Total data size: 332
4 Train X: (298, 7, 7, 256)
5 Train Y: (298, 2)
6 Test X: (34, 7, 7, 256)
7 Test Y: (34, 2)
8 Train on 298 samples, validate on 34 samples
9 Epoch 1/10
10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc:
0.9118
11 Epoch 2/10
12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc:
1.0000
13 Epoch 3/10
14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc:
1.0000
15 Epoch 4/10
16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc:
1.0000
17 Epoch 5/10
18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc:
1.0000
19 Epoch 6/10
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
ML/DL AS A SERVICE - ON YOUR INFRASTRUCTURE
‣ Package model
‣ Package configuration
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
PRE PACKAGE MODELS FOR TRAINING / SERVING
‣ Apply to Kubernetes via
ksonnet
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
MODEL TRAINING
DevEnv
Push Tensorflow
container to registry
Create
tfjob
https://www.slideshare.net/barbarafusinska/hassle-free-scalable-machine-learning-learning-with-kubeflow
https://codelabs.developers.google.com/codelabs/kubeflow-introduction/index.html?index=..%2F..%2Fio2018#2
Store
Results
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
MODEL SERVING
DevEnv
Consume / Use model In local development Or in the Cloud
Deploy app to K8s
Use
Results
Push Application
container to registry
Use & Improve model
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
MODEL TRAINING & SERVING
DevEnv
Consume / Use model In local development Or in the Cloud
Deploy app to K8s
Use
Results
Push Application
container to registry
Use & Improve modelPush Tensorflow
container to registry
1
2 3
4
Train model in Kubeflow
Store
Results
5
6
5
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
A/B TESTING
DevEnv
Consume / Use model In local development Or in the Cloud
Deploy app to K8s
Use
Results
Push Application
container to registry
Use & Improve model
Push Tensorflow
container to registry
1
2 3
4
Train model in Kubeflow
Store
Results
5
6
5
Use Ambassador for
A/B testing 7
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
A ONE STOP SHOP FOR EVERYTHING …
On Prem / 

Cloud
“PaaS" on K8s
▸ Job
▸ Cron Job
▸ POD
▸ Replica sets (multi-step /
distributed)
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
TFJOB CRD - CUSTOM RESOURCE DEFINITION
hagzag@model-tarining 👉 kubectl get tfjob
NAME AGE
wcm 1d
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
OUR IMAGE IN KUBEFLOW …
…
11 clusterName: “minikube"
12 creationTimestamp: 2018-06-23T07:31:54Z
13 generation: 1
14 labels:
15 app.kubernetes.io/deploy-manager: ksonnet
16 name: wcm
17 namespace: wcm
18 resourceVersion: "94971"
19 selfLink: /apis/kubeflow.org/v1alpha1/namespaces/wcm/tfjobs/wcm
20 uid: 80ab9472-76b7-11e8-be6d-0800279cc216
21 spec:
22 RuntimeId: werb
23 replicaSpecs:
24 - replicas: 3
25 template:
26 metadata:
27 creationTimestamp: null
28 spec:
29 containers:
30 - image: tikal/webcam-controller-model:latest
31 name: tensorflow
32 resources: {}
33 restartPolicy: OnFailure
34 tfPort: 2222
35 tfReplicaType: WORKER
36 - replicas: 2
37 template:
‣ Next step is to wrap our model
with some Operator / TF data
so kubeflow can display it …
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
USE S3 AND TERNSORBAORD …
‣ Reuse training results
and display in your
common tensor-flow
tooling.
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
WANT MORE
‣ Demo model -> https://github.com/tikalk/
webcam-controller-model
‣ Kubeflow - the main “engine” kubeflow.io
‣ It also supports other tools … 

https://github.com/dwhitena/
kubeflow_pachyderm
‣ https://github.com/SeldonIO/seldon-core
FullStack Developers Israel
MACHINE LEARNING | CONTINUOUS OPERATIONS
EVEN MORE
Preprocess | ingest data
Serve
Train
Store
FullStack Developers Israel

Mais conteúdo relacionado

Mais procurados

Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 

Mais procurados (20)

The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
 
Developing Microservices with Apache Camel, by Claus Ibsen
Developing Microservices with Apache Camel, by Claus IbsenDeveloping Microservices with Apache Camel, by Claus Ibsen
Developing Microservices with Apache Camel, by Claus Ibsen
 
Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...
Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...
Serverless, Tekton, and Argo CD: How to craft modern CI/CD workflows | DevNat...
 
Spinnaker Summit 2019: Where are we heading? The Future of Continuous Delivery
Spinnaker Summit 2019: Where are we heading? The Future of Continuous DeliverySpinnaker Summit 2019: Where are we heading? The Future of Continuous Delivery
Spinnaker Summit 2019: Where are we heading? The Future of Continuous Delivery
 
Apache Camel workshop at BarcelonaJUG in January 2014
Apache Camel workshop at BarcelonaJUG in January 2014Apache Camel workshop at BarcelonaJUG in January 2014
Apache Camel workshop at BarcelonaJUG in January 2014
 
Automated Serverless Pipelines with #GitOps on Codefresh
Automated Serverless Pipelines with #GitOps on CodefreshAutomated Serverless Pipelines with #GitOps on Codefresh
Automated Serverless Pipelines with #GitOps on Codefresh
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Using Terraform to manage the configuration of a Cisco ACI fabric.
Using Terraform to manage the configuration of a Cisco ACI fabric.Using Terraform to manage the configuration of a Cisco ACI fabric.
Using Terraform to manage the configuration of a Cisco ACI fabric.
 
Using Rally for OpenStack certification at Scale
Using Rally for OpenStack certification at ScaleUsing Rally for OpenStack certification at Scale
Using Rally for OpenStack certification at Scale
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
 
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSKNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
 
Ich brauche einen Abstraktions-Layer für meine Cloud
Ich brauche einen Abstraktions-Layer für meine CloudIch brauche einen Abstraktions-Layer für meine Cloud
Ich brauche einen Abstraktions-Layer für meine Cloud
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech TalkSpring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
 
HKG15-204: OpenStack: 3rd party testing and performance benchmarking
HKG15-204: OpenStack: 3rd party testing and performance benchmarkingHKG15-204: OpenStack: 3rd party testing and performance benchmarking
HKG15-204: OpenStack: 3rd party testing and performance benchmarking
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news   Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news
 

Semelhante a Deep Learning - Continuous Operations

Semelhante a Deep Learning - Continuous Operations (20)

Machine Learning - Continuous operations
Machine Learning - Continuous operationsMachine Learning - Continuous operations
Machine Learning - Continuous operations
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"
 
Cloud native development without the toil
Cloud native development without the toilCloud native development without the toil
Cloud native development without the toil
 
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
 
Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Framework For Automation Testing Practice Sharing
Framework For Automation Testing Practice SharingFramework For Automation Testing Practice Sharing
Framework For Automation Testing Practice Sharing
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
 
Helm intro
Helm introHelm intro
Helm intro
 
Sherlock holmes for dba’s
Sherlock holmes for dba’sSherlock holmes for dba’s
Sherlock holmes for dba’s
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 MonthsHow EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
How EVERFI Moved from No Automation to Continuous Test Generation in 9 Months
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 

Mais de Haggai Philip Zagury

Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
Haggai Philip Zagury
 

Mais de Haggai Philip Zagury (10)

DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Kube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPAKube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPA
 
TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Auth experience - vol 1.0
Auth experience  - vol 1.0Auth experience  - vol 1.0
Auth experience - vol 1.0
 
Auth experience
Auth experienceAuth experience
Auth experience
 
Whats all the FaaS About
Whats all the FaaS AboutWhats all the FaaS About
Whats all the FaaS About
 
Git internals
Git internalsGit internals
Git internals
 
Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Último (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Deep Learning - Continuous Operations

  • 1. FullStack Developers Israel CONTINUOS OPERATIONS DEEP LEARNING | HAGGAI PHILIP ZAGURY
  • 2. Tikal Knowledge TIKAL INTRO WHO WE ARE ? ▸ Tikal helps ISV’s in Israel & abroad in their technological challenges. ▸ Our Engineers are Fullstack Developers with expertise in Android, DevOps, Java, JS, Python, ML ▸ We are passionate about technology and specialise in OpenSource technologies. ▸ Our Tech and Group leaders help establish & enhance existing software teams with innovative & creative thinking. https://www.meetup.com/full-stack-developer-il/
  • 3. FullStack Developers Israel SELF INTRODUCTION ▸ My open thinking and open techniques ideology is driven by Open Source technologies and the collaborative manner defining my M.O. ▸ My solution driven approach is strongly based on hands-on and deep understanding of Operating Systems, Applications stacks and Software languages, Networking, Cloud in general and today more an more Cloud Native solutions. ▸ Technologies: ▸ Linux { just pick a flavour …} ▸ *Scripting ▸ Git ▸ Python/Go ▸ Cloud { public/private/hybrid } ▸ Docker ▸ Kubernetes
 HAGGAI PHILIP ZAGURY - DEVOPS ARCHITECT AND GROUP TECH LEAD
  • 4. FullStack Developers Israel THE STORY … MACHINE LEARNING | CONTINUOUS OPERATIONS
  • 5. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS WE NEED “CI/CD” FOR OUR MODEL TRAINING … ▸ What he didn’t say is … ▸ In-browser training ▸ Backed training ▸ Tensorflow training ▸ Tensorflow serving ▸ Storage [ for raw data & model ] …
  • 6. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS THE LEARNING CURVE
  • 7. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS A RELATIVELY SIMPLE USE CASE … TENSOR-FLOW TRAINING Server SERVER CLIENT - SERVE FRONTEND APP - COLLECT IMAGES - TRAIN -INFER Upload Images Serve Model Get trained Model Enrich Model with new data Upload Images Serve Protobuf Object store 1 2 3 4 5 6
  • 8. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS A CLASSIC APP SERVER CLIENT - SERVE FRONTEND APP - COLLECT IMAGES - TRAIN -INFER Upload Images Serve Model Get trained Model Upload Images Object store 1 2 5 6
  • 9. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS MODEL TRAINING … ‣ If your using a pre-trained model - it’s no different than using a backend / an api endpoint ! ‣ Training processes are complex and require Infrastructure As A Service & On demand ‣ Scalability ‣ faster Time to Market vs. faster results ‣ Scaling costs …
  • 10. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS STAGE #1 ‣ python train_model.py
 3 Total data size: 332 4 Train X: (298, 7, 7, 256) 5 Train Y: (298, 2) 6 Test X: (34, 7, 7, 256) 7 Test Y: (34, 2) 8 Train on 298 samples, validate on 34 samples 9 Epoch 1/10 10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc: 0.9118 11 Epoch 2/10 12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc: 1.0000 13 Epoch 3/10 14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc: 1.0000 15 Epoch 4/10 16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc: 1.0000 17 Epoch 5/10 18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc: 1.0000 19 Epoch 6/10 20 298/298 [==============================] - 0s 1ms/step - loss: 0.0065 - acc: 1.0000 - val_loss: 0.0080 - val_acc: TENSOR-FLOW TRAINING
  • 11. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS STAGE #2 - DOCKERIZE & PARAMETARIZE … ‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller- model:latest TENSOR-FLOW TRAINING 3 Total data size: 332 4 Train X: (298, 7, 7, 256) 5 Train Y: (298, 2) 6 Test X: (34, 7, 7, 256) 7 Test Y: (34, 2) 8 Train on 298 samples, validate on 34 samples 9 Epoch 1/10 10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc: 0.9118 11 Epoch 2/10 12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc: 1.0000 13 Epoch 3/10 14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc: 1.0000 15 Epoch 4/10 16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc: 1.0000 17 Epoch 5/10 18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc: 1.0000 19 Epoch 6/10
  • 12. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS CONTINUOS INTEGRATION ‣ A Jenkins pipeline ‣ Build - get sample data / updated data ‣ Deploy model to cpu/gpu ‣ Train and record results ‣ Promote upload new model for “space invaders” micro service backend
  • 13. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS THE GAME IS JUST A MEANS TO AN END … TENSOR-FLOW TRAINING TENSOR-FLOW TRAINING # epochs lr more flags 1 flags = tf.app.flags 2 flags.DEFINE_float("lr", 0.0001, "Learning Rate") 3 flags.DEFINE_string("units", "((50, 0.2), (40, 0.1))", "Configuration of hidden un 4 "Expected: tuple of tuple pairs. Each pair represent one hidde 5 "For instance: "((100, 0.2), (50, 0.3))" will create dense h 6 "dropout layer with rate of 0.2. Afterwards, it will create de 7 "dropout layer with rate of 0.3. If you wish to have hidden la 8 "second value. Example: "((100,), (50, 0.3))"") 9 flags.DEFINE_integer("epochs", 10, "Number of epochs") 10 flags.DEFINE_float("batch_frac", 0.3, "The fraction of training examples to consid 11 "For instance, 0.1 will divide the training to 10 batches") 12 flags.DEFINE_boolean("draw_plot", False, "Whether to draw a plot at the end") 13 flags.DEFINE_boolean("export_js", False, "Whether to export to a tenorflow.js mode 14 FLAGS = flags.FLAGS TENSOR-FLOW TRAINING # epochs lr more flags ‣ We need to train our model
 With different parameters to
 Reach the Optimal model parameters …
  • 14. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS SACALING / MULTIPLEXING … TENSORFLOW SUPPORTS MULTI-PART / DISTRIBUTED FLOWS ‣ Running the same model with different parameters in order to choose the most efficient vs most accurate vs cost affective pipeline ! ‣ most efficient #of epochs / params https://www.tensorflow.org/performance/datasets_performance
  • 15. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS A/B TESTING / CANARY RELEASES ?! MODEL VER 1.0 MODEL VER 1.7 MODEL VER 2.0 Storage Provider 60% 30% 10% Collect In-Browser 
 training
  • 16. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS TRANSLATION … ▸ A flexible training model ▸ Parametarized flow ▸ Model Testing ▸ Promotion mechanism ▸ Data Import and preprocessing ▸ Post Processing
  • 18. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - AWS ML ▸ Use custom DL AMI’s [ we used them to get started … ]
  • 19. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - AWS ML ▸ Use custom DL AMI’s [ we used them to get started … ]
  • 20. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - AWS ML
  • 21. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - GCP ML/DL ▸ Assume you develop in the cloud / on the cloud ▸ Consume C/G/Tpu’s constantly ▸ Adjust your workflow to Google Patterns (which isn’t a bad thing …)
  • 22. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - GCP ML/DL ▸ TPC lock-in ? ▸ Wouldn’t it be nice to benchmark TPU & GPU on another provider ?!
  • 23. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OPTIONS - AZURE ML/DL
  • 24. FullStack Developers Israel IT’S ALL ABOUT THE PIPELINE / WORKFLOW
  • 25. FullStack Developers Israel TEXT IT’S ALL ABOUT THE PIPELINE / WORKFLOW ‣ You might be able to make this work … ‣ But !
  • 26. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS THERES A PATTERN HERE … IDE Model Serving Model Storage Parameter injectionParameterized training Training Orchestrator 1 2 3 4 5 6
  • 27. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS STAGE #3 - ADJUST OUR DOCKERIZED APP TO MY VENDOR … ‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller- model:latest TENSOR-FLOW TRAINING 3 Total data size: 332 4 Train X: (298, 7, 7, 256) 5 Train Y: (298, 2) 6 Test X: (34, 7, 7, 256) 7 Test Y: (34, 2) 8 Train on 298 samples, validate on 34 samples 9 Epoch 1/10 10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc: 0.9118 11 Epoch 2/10 12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc: 1.0000 13 Epoch 3/10 14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc: 1.0000 15 Epoch 4/10 16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc: 1.0000 17 Epoch 5/10 18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc: 1.0000 19 Epoch 6/10
  • 28. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS DO I CARE ABOUT VENDOR LOCK-IN ?! - LET’S TALK MULTI-CLOUD my laptop 
 cloud I need CPU / GPU / TPU Adjust / Wrap our code to suit the Vendor TENSOR-FLOW TRAINING TENSOR-FLOW TRAINING TENSOR-FLOW TRAINING
  • 29. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS IT’S NOT ONLY A MATTER OF VENDOR LOCK-IN! - IT’S MULTI-CLOUD Only in Google ATM CPU GPU TPU my laptop 
 cloud I need CPU / GPU / TPU
  • 31. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS TF [TENSORFLOW] OPERATOR
  • 32. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS STAGE #4 - WRAP CODE TO SUPPORT WORKER | ADMIN | PS OPERATOR PATTERN ‣ docker run -v “${PWD}/data:/opt/data” tikal/webcam-controller- model:latest TENSOR-FLOW TRAINING 3 Total data size: 332 4 Train X: (298, 7, 7, 256) 5 Train Y: (298, 2) 6 Test X: (34, 7, 7, 256) 7 Test Y: (34, 2) 8 Train on 298 samples, validate on 34 samples 9 Epoch 1/10 10 298/298 [==============================] - 1s 3ms/step - loss: 0.5061 - acc: 0.7651 - val_loss: 0.2331 - val_acc: 0.9118 11 Epoch 2/10 12 298/298 [==============================] - 0s 1ms/step - loss: 0.1361 - acc: 0.9765 - val_loss: 0.0763 - val_acc: 1.0000 13 Epoch 3/10 14 298/298 [==============================] - 0s 1ms/step - loss: 0.0471 - acc: 0.9966 - val_loss: 0.0365 - val_acc: 1.0000 15 Epoch 4/10 16 298/298 [==============================] - 0s 1ms/step - loss: 0.0172 - acc: 1.0000 - val_loss: 0.0196 - val_acc: 1.0000 17 Epoch 5/10 18 298/298 [==============================] - 0s 1ms/step - loss: 0.0123 - acc: 1.0000 - val_loss: 0.0113 - val_acc: 1.0000 19 Epoch 6/10
  • 33. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS ML/DL AS A SERVICE - ON YOUR INFRASTRUCTURE ‣ Package model ‣ Package configuration
  • 34. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS PRE PACKAGE MODELS FOR TRAINING / SERVING ‣ Apply to Kubernetes via ksonnet
  • 35. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS MODEL TRAINING DevEnv Push Tensorflow container to registry Create tfjob https://www.slideshare.net/barbarafusinska/hassle-free-scalable-machine-learning-learning-with-kubeflow https://codelabs.developers.google.com/codelabs/kubeflow-introduction/index.html?index=..%2F..%2Fio2018#2 Store Results
  • 36. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS MODEL SERVING DevEnv Consume / Use model In local development Or in the Cloud Deploy app to K8s Use Results Push Application container to registry Use & Improve model
  • 37. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS MODEL TRAINING & SERVING DevEnv Consume / Use model In local development Or in the Cloud Deploy app to K8s Use Results Push Application container to registry Use & Improve modelPush Tensorflow container to registry 1 2 3 4 Train model in Kubeflow Store Results 5 6 5
  • 38. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS A/B TESTING DevEnv Consume / Use model In local development Or in the Cloud Deploy app to K8s Use Results Push Application container to registry Use & Improve model Push Tensorflow container to registry 1 2 3 4 Train model in Kubeflow Store Results 5 6 5 Use Ambassador for A/B testing 7
  • 39. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS A ONE STOP SHOP FOR EVERYTHING … On Prem / 
 Cloud “PaaS" on K8s ▸ Job ▸ Cron Job ▸ POD ▸ Replica sets (multi-step / distributed)
  • 40. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS TFJOB CRD - CUSTOM RESOURCE DEFINITION hagzag@model-tarining 👉 kubectl get tfjob NAME AGE wcm 1d
  • 41. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS OUR IMAGE IN KUBEFLOW … … 11 clusterName: “minikube" 12 creationTimestamp: 2018-06-23T07:31:54Z 13 generation: 1 14 labels: 15 app.kubernetes.io/deploy-manager: ksonnet 16 name: wcm 17 namespace: wcm 18 resourceVersion: "94971" 19 selfLink: /apis/kubeflow.org/v1alpha1/namespaces/wcm/tfjobs/wcm 20 uid: 80ab9472-76b7-11e8-be6d-0800279cc216 21 spec: 22 RuntimeId: werb 23 replicaSpecs: 24 - replicas: 3 25 template: 26 metadata: 27 creationTimestamp: null 28 spec: 29 containers: 30 - image: tikal/webcam-controller-model:latest 31 name: tensorflow 32 resources: {} 33 restartPolicy: OnFailure 34 tfPort: 2222 35 tfReplicaType: WORKER 36 - replicas: 2 37 template: ‣ Next step is to wrap our model with some Operator / TF data so kubeflow can display it …
  • 42. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS USE S3 AND TERNSORBAORD … ‣ Reuse training results and display in your common tensor-flow tooling.
  • 43. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS WANT MORE ‣ Demo model -> https://github.com/tikalk/ webcam-controller-model ‣ Kubeflow - the main “engine” kubeflow.io ‣ It also supports other tools … 
 https://github.com/dwhitena/ kubeflow_pachyderm ‣ https://github.com/SeldonIO/seldon-core
  • 44. FullStack Developers Israel MACHINE LEARNING | CONTINUOUS OPERATIONS EVEN MORE Preprocess | ingest data Serve Train Store