PyTorch + TensorFlow + RedisAI + Streams -- Advanced Spark and TensorFlow Meetup -- May 25 2019

PyTorch + TensorFlow + RedisAI
Chris Fregly
Founder @

• [tensor]werk
• Luca Antiga, Sherin Thomas, Rick Izzo, Pietro Rota
• RedisLabs
• Guy Korland, Itamar Haber, Pieter Cailliau, Meir Shpilraien, Mark
Nunberg, Ariel Madar
• Orobix
• Everyone!
• Manuela Bazzana, Daniele Ciriello, Lisa Lozza, Simone Manini,
Alessandro Re
PRESENTED BY
Acknowledgements

• Founder @ PipelineAI
Real-time Machine Learning and AI in Production
• Former Databricks, Netflix
• Apache Spark Contributor
• O’Reilly Author
High Performance TensorFlow in Production
Who Am I? (@cfregly)

1 New to the Bay Area?
2 Familiar with Redis Modules?
3 Familiar with TensorFlow? PyTorch? Others?
Who are you?
4 Familiar with Kubernetes? KubeFlow?
5 Used to work at MapR?!
Wants to hire Antje Barth?6

1 Challenges of Deep Learning in Production
Deep learning and challenges for production
2 Introducing RedisAI
API, architecture, operation
3 Roadmap
What’s next
Agenda:
4 Demo
Live notebook

End-to-End Deep Learning in Production

• Python code (Flask, Tornado, …)
• Execution service from cloud provider
• AWS SageMaker
• Google Cloud ML Server
• Runtime
• PipelineAI / Kubernetes
• KubeFlow
• TensorFlow Serving
• NVIDIA TensorRT
• MXNet Model Server
• Bespoke solutions (C++, Java, …)
Model Servers

• Must fit the technology stack
• Not just about languages, but about
semantics, scalability, guarantees
• Run anywhere, any size
• Composable building blocks
• Limit the amount of moving parts
• Limit the amount of moving data
• Must make best use of resources
Model Serving Infrastructure

1 Challenges of Deep Learning in Production
Deep learning and challenges for production
2 Introducing RedisAI
API, architecture, ops, client libs
3 Roadmap
What’s next
Agenda:
4 Demo
Live notebook

RedisAI
Tensor In
def addsq(a, b):
return (a + b)**2
TorchScript
CPU
GPU0
GPU1
…
• Model Server inside of Redis
• Implemented as a Redis Module
• New data type: Tensor
Tensor Out

Advantages of RedisAI Today
• Keeps the data local
• Runs everywhere Redis runs
• Runs multi-backend
• Stay language-independent
• Optimize use of resources
• Keeps models hot
• HA with sentinel, clustering
https://github.com/RedisAI/redisai-examples PRESENTED BY

• AI.TENSORSET
• AI.TENSORGET
PRESENTED BY
API: Tensor
AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4
AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw
AI.TENSORGET
AI.TENSORGET
foo
foo
BLOB
VALUES
AI.TENSORGET foo META

• Based on dlpack https://github.com/dmlc/dlpack
• Framework-independent
PRESENTED BY
API: Tensor
typedef
void*
struct {
data;
DLContext ctx;
int ndim;
DLDataType dtype;
int64_t*
int64_t*
uint64_t
shape;
strides;
byte_offset;
} DLTensor;

• AI.MODELSET
API: Model
resnet50 TORCH GPU < foo.pt
AI.MODELSET
AI.MODELSET
resnet50 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt

• AI.MODELRUN
API: Model
AI.MODELRUN resnet50 INPUTS foo OUTPUTS bar

• TensorFlow (+ Keras): freeze graph
Exporting models
import tensorflow as tf
var_converter = tf.compat.v1.graph_util.convert_variables_to_constants
withtf.Session() as sess:
sess.run([tf.global_variables_initializer()])
frozen_graph = var_converter(sess, sess.graph_def, ['output'])
tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False)
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py
PRESENTED BY

• PyTorch: JIT model
Exporting models
import torch
batch = torch.randn((1, 3, 224, 224))
traced_model = torch.jit.trace(model, batch)
torch.jit.save(traced_model, 'resnet50.pt')
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py
PRESENTED BY

• AI.SCRIPTSET
• AI.SCRIPTRUN
PRESENTED BY
API: Script
myadd2 GPU < addtwo.txtAI.SCRIPTSET
AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar
def addtwo(a, b):
return a + b
addtwo.txt

• SCRIPT is a TorchScript interpreter
• Python-like syntax for tensor ops
• Works with CPU and GPU
• Vast library of tensor operations
• Allows to prescribe computations
directly (without exporting from a
Python env, etc)
• Pre-proc, post-proc (but not only)
Scripts?
Ref: https://pytorch.org/docs/stable/jit.html
PT MODEL
TF MODEL
SCRIPTin_key SCRIPT
PRESENTED BY
out_key

• Tensors: framework-agnostic
• Queue + processing thread
• Backpressure
• Redis stays responsive
• Models are kept hot in memory
• Client blocks
Architecture
PRESENTED BY

• RDB supported
• Append-Only File (AOF) coming soon
• Tensors are serialized (meta + blob)
• Models are serialized back into
protobuf
• Scripts are serialized as strings
Persistence
PRESENTED BY

• Master-replica supported for all data
types
• Right now, run cmds replicated too
(post-conf: replication of results of
computations where appropriate)
• Cluster supported, caveat: sharding
models and scripts
• For the moment, use hash tags
{foo}resnet50 {foo}input1234
Replication
PRESENTED BY

• NOTE: any Redis client works right now
• JRedisAI https://github.com/RedisAI/JRedisAI
• redisai-py https://github.com/RedisAI/redisai-py
• Coming up: NodeJS, Go, … (community?)
RedisAI Client Libs
PRESENTED BY

RedisAI from NodeJS
PRESENTED BY

Roadmap: DAGRUN
PRESENTED BY
preproc normalize img ~in~
AI.DAGRUN
SCRIPTRUN
MODELRUN
SCRIPTRUN
resnet18
postproc
INPUTS ~in~ OUTPUTS
probstolabel INPUTS
~out~
~out~ OUTPUTS label
• DAG = Direct Acyclic Graph
• Atomic operation
• Volatile keys (~x~): command-local, don’t touch keyspace

Roadmap: DAGRUNRO
~img~ FLOAT 1 3
preproc normalize ~img~
224 224 BLOB ...
~in~
~out~
~out~ OUTPUTS ~label~
AI.DAGRUNRO
TENSORSET
SCRIPTRUN
MODELRUN
SCRIPTRUN
TENSORGET
resnet18
postproc
~label~
INPUTS ~in~ OUTPUTS
probstolabel INPUTS
VALUES
• AI.DAGRUNRO:
• if no key is written, replicas can execute in parallel
• errors if commands try to write to non-volatile keys
DAGRUN
PRESENTED BY
DAGRUNRO
DAGRUNRO
DAGRUNRO

Roadmap: Multi-Device Execution
GPU0 ...AI.MODELSET
AI.MODELSET
resnet50a TF
resnet50b TORCH GPU1 ...
AI.TENSORSET img ...
AI.DAGRUN
preproc normalize img ~in~
INPUTS
INPUTS
~in~
~in~
OUTPUTS
OUTPUTS
~out1~
~out2~
ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~
SCRIPTRUN
MODELRUN
MODELRUN
SCRIPTRUN
SCRIPTRUN
resnet50a
resnet50b
postproc
postproc probstolabel INPUTS ~out~ OUTPUTS label
• Parallel multi-device execution
• One queue per device (cpu, gpu)
normalize
GPU0 GPU1
ensamble
PRESENTED BY
probstolabels

Roadmap: DAGRUNASYNC
PRESENTED BY
OUTPUTS label
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet50 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~
=> 12634
AI.DAGRUNINFO 1264
=> RUNNING [... more details …]
AI.DAGRUNINFO 1264
=> DONE
• AI.DAGRUNASYNC:
• Non-blocking, returns an ID
• ID can be used to later retrieve status + keyspace notification

• I stream, you stream, we all stream for Redis Streams!
PRESENTED BY
Roadmap: Streams
preproc normalize instream ~in~
INPUTS ~in~ OUTPUTS
AI.DAGRUN
SCRIPTRUN
MODELRUN
SCRIPTRUN
resnet18
postproc probstolabel INPUTS
~out~
~out~ OUTPUTS outstream
AI.DAGRUNASYNC
preproc normalize instream ~in~
INPUTS ~in~ OUTPUTS
SCRIPTRUN
MODELRUN
SCRIPTRUN
resnet18
postproc probstolabel INPUTS
~out~
~out~ OUTPUTS outstream

• Runtime from Microsoft
• ONNX
• exchange format for NN
• export from many frameworks
(MXNet, CNTK, …)
• ONNX-ML
• ONNX for machine learning models
(RandomForest, SVN, K- means, etc)
• export from scikit-learn
Roadmap: ONNX Runtime
PRESENTED BY

• Auto-batching
• Transparent batching of requests
• Queue gets rearranged according
to other analogous requests in the
queue, time to response
• Only if different clients or async
• Time-to-response
• ~Predictable in DL graphs
• Can reject request if TTR >
estimated time of queue + run
Roadmap: Auto-batching
concat along 0-th dim
run batch
analyze queue
Run if TTR Reject
< 800ms
Run if TTR OK
< 800ms
PRESENTED BY
A B

• Lazy loading of backends:
• Only load what you need
(especially on GPU)
Roadmap: Lazy Loading
TF CPU
TF GPU
PRESENTED BY
LOADBACKEND
LOADBACKEND
LOADBACKEND
LOADBACKEND
TORCH CPU
TORCH GPU
AI.CONFIG
AI.CONFIG
AI.CONFIG
AI.CONFIG
...

• Monitoring with AI.INFO:
• more granular information on
running operations, commands,
memory
Roadmap: Monitoring
PRESENTED BY
MODEL key
SCRIPT key
AI.INFO
AI.INFO
AI.INFO
...

Demo: PipelineAI + Redis + PyTorch + TensorFlow
PRESENTED BY
• End-to-End Deep Learning from Research to Production
• Any Cloud, Any Framework, Any Hardware
• Free Community Edition: https://community.pipeline.ai

• https://github.com/RedisAI/RedisAI
• https://hub.docker.com/u/redisai
PRESENTED BY
GitHub and DockerHub Repos
docker run -p 6379:6379 -it --rm redisai/redisai

Thank you!
https://pipeline.ai
@cfregly (PipelineAI)

PyTorch + TensorFlow + RedisAI + Streams -- Advanced Spark and TensorFlow Meetup -- May 25 2019

Recommended

Recommended

More Related Content

More from Chris Fregly

More from Chris Fregly (20)

Recently uploaded

Recently uploaded (20)

PyTorch + TensorFlow + RedisAI + Streams -- Advanced Spark and TensorFlow Meetup -- May 25 2019