Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Video: https://youtu.be/AaBqhGEwxXI
GitHub: https://github.com/PipelineAI/pipeline
2. Founder @ PipelineAI
Real-time Machine Learning and AI in Production
Former Databricks, Netflix
Apache Spark Contributor
O’Reilly Author
High Performance TensorFlow in Production
Meetup Organizer
Advanced Spark and TensorFlow Meetup
Who Am I? (@cfregly)
3. Advanced Spark and TensorFlow Meetup (Global, Monthly Events)
https://meetup.com/Advanced-Spark-and-TensorFlow-Meetup
4. Upcoming Full-Day Workshop on Saturday, November 2, 2019!
https://pipeline.ai @cfregly @PipelineAI
Next Workshop:
Nov 2, 2019
Next Workshop:
Nov 2, 2019
5. 1 OK with Command Line?
2 OK with Python?
3 OK with Linear Algebra?
Who are you?
4 OK with Docker?
6
5 OK with Jupyter Notebook?
9. Note #1 of 10
IGNORE WARNINGS & ERRORS
Everything will be OK!
10. Note #2 of 10
THERE IS A LOT OF MATERIAL HERE
Many opportunities to explore on your own.
(Don’t upload sensitive data)
11. Note #3 of 10
YOU HAVE YOUR OWN INSTANCE
16 CPU, 104 GB RAM, 200GB SSD
12. Note #4 of 10
DATASETS
Chicago Taxi Dataset
(and various others)
13. Note #5 of 10
SOME NOTEBOOKS TAKE MINUTES
Please be patient.
(We are using large datasets)
14. Note #6 of 10
QUESTIONS?
Post questions to Zoom chat or Q&A.
(Antje and I will answer soon)
Antje >
15. Note #7 of 10
KUBEFLOW IS NOT A SILVER BULLET
There are still gaps in the pipeline.
(But gaps are getting smaller)
16. Note #8 of 10
THIS IS NOT CLOUD DEPENDENT*
*Except for 2 small exceptions…
Patches are underway.
17. Note #9 of 10
PRIMARILY TENSORFLOW 1.x
TF 2.x is not fully supported by TFX
(We have a section on TF 2)
18. Note #10 of 10
SHUTDOWN EACH NOTEBOOK AFTER
We are using complex browser voo-doo.
19. System 6
System 5System 4
Training
At Scale
System 3
System 1
Data
Ingestion
Data
Analysis
Data
Transform
Data
Validation
System 2
Build
Model
Model
Validation
Serving Logging
Monitoring
Roll-out
Data
Splitting
Ad-Hoc
Training
Why TFX and Why KubeFlow?
Improve Training/Serving
Consistency
Unify Disparate Systems
Manage Pipeline Complexity
Improve Portability
Wrangle Large Datasets
Improve Model Quality
Manage Versions
Composability
Distributed
Training
Configure
20. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Parameter Tuning with KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5
32. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Parameter Tuning with KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5
33. 2.1 TFX Internals
2.0 TFX Components
6
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
2.2 TFX Libraries
2.2 TFX Components
34. 2.1 TFX Internals
Driver/Publisher
Moves data to/from Metadata Store
Executor
Runs the Actual Processing Code
Metadata Store
Artifact, execution, and lineage Info
Track inputs & outputs of all components
Stores training run including inputs & outputs
Analysis, validation, and versioning results
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
35. 2.2 TFX Libraries
2.2.1
TFX Components Use These:
TensorFlow Data Validation (TFDV)
TensorFlow Transform (TFT)
TensorFlow Model Analysis (TFMA)
TensorFlow Metadata (TFMD) + ML Metadata (MLMD)
2.2.2
2.2.3
2.2.4
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
36. 2.2.1 TFX Libraries - TFDV
TensorFlow Data Validation (TFDV)
Find Missing, Redundant & Important Features
Identify Features with Unusually-Large Scale
`infer_schema()` Generates Schema
Describe Feature Ranges
Detect Data Drift
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Uniformly
Distributed Data è
ç Non-Uniformly
Distributed Data
38. 2.2.2 TFX Libraries - TFT
TensorFlow Transform (TFT)
Preprocess `tf.Example` data with TensorFlow
Useful for data that requires a full pass
Normalize all inputs by mean and std dev
Create vocabulary of strings è integers over all data
Bucketize features based on entire data distribution
Outputs a TensorFlow graph
Re-used across both training and serving
Uses Apache Beam (local mode) for Parallel Analysis
Can also use distributed mode
`preprocessing_fn(inputs)`: Primary Fn to Implement
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
import tensorflow as tf
import tensorflow_transform as tft
def preprocessing_fn(inputs):
x = inputs['x']
y = inputs['y']
s = inputs['s']
x_centered = x - tft.mean(x)
y_normalized = tft.scale_to_0_1(y)
s_integerized = tft.compute_and_apply_vocabulary(s)
x_centered_times_y_normalized = x_centered * y_normalized
return {
'x_centered': x_centered,
'y_normalized': y_normalized,
'x_centered_times_y_normalized':x_centered_times_y_normalized,
's_integerized': s_integerized
}
41. 2.2.3 TFX Libraries - TFMA
TensorFlow Model Analysis (TFMA)
Analyze Model on Different Slices of Dataset
Track Metrics Over Time (“Next Day Eval”)
`EvalSavedModel` Contains Slicing Info
TFMA Pipeline: Read, Extract, Evaluate, Write
ie. Ensure Model Works Fairly Across All Users
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
43. 2.2.4 TFX Libraries – Metadata
TensorFlow Metadata (TFMD)
ML Metadata (MLMD)
Record and Retrieve Experiment Metadata
Artifact, Execution, and Lineage Info
Track Inputs / Outputs of All TFX Components
Stores Training Run Info
Analysis and Validation Results
Model Versioning Info
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
45. 2.3.1 ExampleGen
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Load Training Data Into TFX Pipeline
Supports External Data Sources
Supports CSV and TFRecord Formats
Converts Data to tf.Example
Note: TFX Pipelines require tf.Example (?!)
Difficult to use non-TF models like XGBoost
from tfx.utils.dsl_utils import csv_input
from
tfx.components.example_gen.csv_example_gen.component
import CsvExampleGen
examples = csv_input(os.path.join(base_dir, 'data/simple'))
example_gen = CsvExampleGen(input_base=examples)
47. 2.3.3 SchemaGen
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Schema Needed by Some TFX Components
Data Types, Value Ranges, Optional, Required
Consumes Data from StatisticsGen
Schema used by TFDV, TFT, TFMA Libraries
Uses TFDV Library to infer schema
Best effort and basic
Human should verify
feature {
name: "age"
value_count {
min: 1
max: 1
}
type: FLOAT
presence {
min_fraction: 1
min_count: 1
}
}
from tfx import components
infer_schema = components.SchemaGen(
stats=compute_training_stats.outputs.output)
49. 2.3.5 Transform
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Uses Data from ExampleGen & SchemaGen
Transformations Become Part of TF Graph (!!)
Helps Avoid Training/Serving Skew
Uses TFT Library for Transformations
Transformations Require Full Pass Thru Dataset
Global Reduction Across All Batches
Create Word Embeddings, Normalize, PCA
def preprocessing_fn(inputs):
# inputs: map from feature keys
# to raw not-yet-transformed features
# outputs: map from string feature key
# to transformed feature operations
50. 2.3.6 Trainer
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Trains / Validates tf.Examples from Transform
Uses schema.proto from SchemaGen
Produces SavedModel and EvalSavedModel
Uses Core TensorFlow Python API
Works with TensorFlow 1.x Estimator API
TensorFlow 2.0 Keras Support Coming Soon
from tfx import components
trainer = components.Trainer(
module_file=taxi_pipeline_utils,
train_files=transform_training.outputs.output,
eval_files=transform_eval.outputs.output,
schema=infer_schema.outputs.output,
tf_transform_dir=transform_training.outputs.output,
train_steps=10000,
eval_steps=5000)
51. 2.3.7 Evaluator
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Uses EvalSavedModel from Trainer
Writes Analysis Results to ML Metadata Store
Uses TFMA Library for Analysis
TFMA Uses Apache Beam to Scale Analysis
from tfx import components
import tensorflow_model_analysis as tfma
taxi_eval_spec = [
tfma.SingleSliceSpec(),
tfma.SingleSliceSpec(columns=['trip_start_hour'])
]
model_analyzer = components.Evaluator(
examples=examples_gen.outputs.eval_examples,
eval_spec=taxi_eval_spec,
model_exports=trainer.outputs.output)
52. 2.3.8 ModelValidator
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Validate Models from Trainer
Uses Data from SchemaGen & StatisticsGen
Compares New Models to Baseline
Baseline == current model in production
New Model is Good if Meets/Exceeds Metrics
If Good, Notify Pusher to Deploy New Model
Simulate “Next Day Evaluation” On New Data
import tensorflow_model_analysis as tfma
taxi_mv_spec = [tfma.SingleSliceSpec()]
model_validator = components.ModelValidator(
examples=examples_gen.outputs.output,
model=trainer.outputs.output)
53. 2.3.9 Model Pusher (Deployer)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Push Good Model to Deployment Target
Uses Trained SavedModel
Writes Version Data to Metadata Store
Write to FileSystem or TensorFlow Hub
from tfx import components
pusher = components.Pusher(
model_export=trainer.outputs.output,
model_blessing=model_validator.outputs.blessing,
serving_model_dir=serving_model_dir)
54. 2.3.10 Slack Component (!!)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Runs After ModelValidator
Adds Human-in-the-Loop Step to Pipeline
TFX Sends Message to Slack with Model URI
Asks Human to Review the New Model
Respond ‘LGTM’, ‘approve’, ‘decline’, ‘reject’
Requires Slack API Setup / Integration
export SLACK_BOT_TOKEN={your_token}
_channel_id = 'my-channel-id'
_slack_token = os.environ['SLACK_BOT_TOKEN’]
slack_validator = SlackComponent(
model_export=trainer.outputs.output,
model_blessing=model_validator.outputs.blessing,
slack_token=_slack_token, channel_id=_channel_id,
timeout_sec=3600, )
https://github.com/tensorflow/tfx/tree/master
/tfx/examples/custom_components/slack/slack_component
55. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Parameter Tuning with KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5
56. 3.0 ML Pipelines with Airflow and KubeFlow
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy
3.1 Airflow
KubeFlow3.2
66. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Parameter Tuning with KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5
67. 4.0 Hyper-Parameter Tuning
6
Experiment
Single Optimization Run
Single Objective Function Across Runs
Contains Many Trials
Trial
List of Param Values
Suggestion
Optimization Algorithm
Job
Evaluates a Trial
Calculates Objective
69. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Parameter Tuning with KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5
70. 5.0 Deploy Notebook as Job
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
6
5.1 Wrap Model in a Docker Image
Deploy Job to Kubernetes5.2
71. 5.1 Create Docker Image
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
72. 5.2 Deploy Notebook as Job
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
75. 1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with TFX, Airflow, and KubeFlow
Agenda
Hyper-Parameter Tuning with TFX and KubeFlow
Deploy Notebook with Kubernetes
2
3
4
5