Recommender Systems from A to Z – Real-Time Deployment

Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment

Real-Time Deployment
Cloud Infrastructure
Online Updates
Testing and CI/CD
ML Data Management

Recommender Systems
Items DB
Users DB
What should I watch?
I need this
user tastes
I need all
the items

Recommender Systems
Items DB
Model
Predictions
Users DB
I need this
user tastes
I need all
the items

Recommender Systems
Items DB
Model
Predictions
Users DB
Top K
Star Wars!
I need this
user tastes
I need all
the items

Recommender Systems – Previous Meetups
(1) With started with the dataset.
(2) Then we trained models.
(3) We selected the best one.
(4) And now, we want to deploy the model to production!

Recommender Systems Datasets – Previous Meetups
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Explicit feedback Implicit feedback
Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses
Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases
Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Diﬀicult to interpret

Recommender Systems Datasets – This Meetup
Abstract User/Item Interactions
user-id item-id value
1234 4321 5.0
1234 654 2.0
456 2432 3.5
456 654 1.0
456 432 4.5
987 12 5.0

Recommender Systems Models – Previous Meetups
Example of model we want to deploy

Recommender Systems Models – Previous Meetups
users embeddings
items embeddings
model weights
Example of model we want to deploy

Recommender Systems Models – This Meetup
User Embeddings Item Embeddings
Model Weights
Alice Star Wars
4.5/5

Software vs ML
Traditional Software Machine Learning Software
Stateless Stateful
Explicit Specifications No Specifications
Rule-based Logic from Code Model-based Logic from Data

Key Objectives
Highly scalable, highly available
Large number of users and items
No downtime

Key Objectives
No downtime
Continuous user updates
New user/item interactions (e.g. ratings, clicks, watch)

Key Objectives
No downtime
Frequent item updates and new items
New items are added continuously

Key Objectives
No downtime
Frequent item updates and new items
New items are added continuously
Non-trivial model
Can’t be built in e.g. SQL or ElasticSearch

Assumptions
Not Too Big
Can be trained on big enough ML server
No need for distributed ML
Less than 1TB of data, so less than 1B items

Assumptions
Not Too Big
vs vs vs

Assumptions
Not Too Big
Model Updates Can Be Delayed
You do not need to update the entire model several times per hour in live
vs vs vs

Assumptions
Not Too Big
(Adding new items in real time may be supported though)
vs vs vs

Assumptions
Not Too Big
(Adding new items in real time may be supported though)
Recommendations Are Not Stored
You do not need to save recommendations
vs vs vs

Cloud Infrastructure – Plan
Users Embeddings Storage
Items Embeddings Storage
Model Weights Storage

Users Embeddings Storage
User Embeddings
Binary blob <1KB
Keep changing
Need only one per user request
Key-Value Store
Fetch the user embeddings from the cloud at each request
Atomic key-value store like redis will handle concurrency for free

Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data

Items Embeddings – The Big Issue
Network Problem
If the items data is stored in a database like SQL,
and the model is too complex to be expressed in SQL:
then you need to fetch 100% of the items data from the DB to your compute instance
...at each request!
Rule of Thumb
1M items * 1KB each → 1GB total data
(1B items * 1KB each → 1TB total data)

Network Issue – Solution
Items DB
Model
Predictions
Users DB
Top K
Candidates
Generation
What should I
watch?
Star Wars!
Solution!

Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike

Items Candidate Generation
Goal: pre-select thousand(s) of items for your model, without need to see all embeddings
Model-Free
E.g. kNN item-item with pre-computed kNN tables
Easy with ML-ready DB like Spark
Do-able in ElasticSearch or even SQL
Model-Based
E.g. linear matrix factorization, then smart implementation of Top-K
Model has to be monotonic (w.r.t. items dimensions) otherwise you can’t rely on pre-computed index
Ref: ElasticSearch with word2vec embeddings http://bit.ly/2Wciike (WIP without eﬀicient Top-K [issue#42326])

Items Embeddings – The Bigger Issue

Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution

Throughput Problem and Solution
Throughput Problem
You can’t get thousands of arbitrary items embeddings at each request
The physical limit is DB CPU, not network speed
Hosting a local redis co-located in the same physical machine as your process does not help
Throughput Solution (DevOps Nightmares!)
🎃🧛🕷 You need to keep items embeddings in memory of your processes 🕷🧛🎃

Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues

Cloud Storage
Read once when spawning your processes
Fully updated every time you deploy a new model
<1M items → static file storage works well (e.g. AWS S3, Google Storage)
>1B items → big data storage (AWS RedShift, Google BigQuery, Hadoop/HDFS), updating will be “interesting”
In-Memory Replica
Loading pre-fork enabled copy-on-write → can share read-only data with many processes for free
Otherwise shared-memory but with concurrency issues
>1B items → cannot load everything at init, and require cache-like mechanisms

Model Weights Storage
Cloud Storage
Models weights are typically <1MB
Models weights are not updated often
Static file storage works well
In-Memory Replica
Small enough, any strategy will work (duplication, copy-on-write, shared-memory, …)

Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Weights
in S3
Items kNN
DB

Infrastructure
Items Embs
in S3
Users Embs
in Redis
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM

Infrastructure
Items Embs
in S3
Users Embs
in Redis
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM

Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM

Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM3

Infrastructure
Items Embs
in S3
Users Embs
in Redis
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
3

Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
2
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3

Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch? Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3

Infrastructure
Items Embs
in S3
Users Embs
in Redis
Model
Predictions
6
2
Top K
kNN
Candidates
Generation
1 What should
I watch?
7
Star Wars!
Items Embs
in RAM
Model
Weights
in S3
Items kNN
DB
Model
Weights
in RAM
4
5
3

Online Updates – Plan
Users Embeddings Update
Items Embeddings Update
Model Weights Update

User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)

User Updates – The Problem
Goal
Update user embeddings on new user/item interaction
Critical for “session-based” recommendations (e.g. anonymous browsing on retail website)
Technical Issue
To update the user embeddings you usually need all users interactions
Some user may have interacted with thousands of items (e.g. listening history on Spotify)

User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks

User Updates – The Solution
Have two kinds of user updates, a quick one and a slow one:
Quick “Single-Step” Update
Do one step of online update given only the data for the new interaction
e.g. one step of stochastic gradient descent
Slow “Full” Update
Periodically schedule user updates from scratch, happening in the background
Requires a scheduler or a technology for background tasks
(hello “Discover Weekly” 👋)

User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub

User Updates – Background Tasks
Scheduling / Task Queue Technologies
● Cron
● Celery, RabbitMQ, Redis Pub/Sub
Sharing Memory
Executing the user update requires to load all item embeddings in memory again
● Either you do not share memory, and do not update too often
● Or you do the update in your main workers
● Or you background workers are forks or your main workers (e.g. uwsgi spooling)

Item Updates
New Items (Cold Start)
New items don’t have any user interactions
Collaborative-Filtering models do not support adding new items
Content-Based models do support adding new items
Update Items from New Interactions
SVD-based Matrix Factorization supports restricted forms of online-update
Otherwise heuristics might work, such as single-step of gradient
Benefits typically do not worth the DevOps troubles

Item Updates
Update Cloud Storage
Typically replace the entire file, or use rsync for smarter in-place updates
Update In-Memory Replica
Not possible with “copy-on-write sharing” → re-deploy all your workers
Do-able with shared memory, but small benefits might not worth the DevOps troubles

Model Updates
Goal
Update the model weights based on all new user/item interactions
Deploy new model weights without downtime

Model Updates
Goal
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy

Model Updates
Goal
Real-Time?
Like for items embeddings, everything is much easier if model weights are read-only
Small benefits of real-time update of the model weights
You probably want to retrain your model from scratch once in a while and re-deploy
(hello again “Discover Weekly” 👋)

Testing and CI/CD – Plan
Unit-Testing and Advanced Tests
Versioning
Orchestration

Unit-Tests for ML
Diﬀerence with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj

Unit-Tests for ML
Diﬀerence with Traditional Software
No way to programmatically express the specifications of a ML software
C.f. “Software 2.0” by Andrej Karpathy http://bit.ly/2Ni1apj
Small Specific Unit-Tests
Make unit-tests absurdly easy to pass. If they fail, you must be sure you have a bug
Test your code, not the generalization ability of your model:
● well known algorithm? → there is maths and proofs
● heuristic? → expected to fail, this shouldn’t make CI fail
E.g. test that your model can successfully overfit train data when you remove all regularization

Advanced Tests for ML
Goal
Test generalization ability of your models

Goal
Synthetic Datasets > Real Datasets
Full control of the assumptions
You can make it easy enough so if the test fails, something is wrong

Goal
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)

Goal
(disclaimer: this talk is for ML Engineers&Ops 👷‍♀not ML Researchers 👩‍🎓)
Compare Against Baselines
Hard to find good performance thresholds for the tests to pass
Easy to test that your model performs better than simple baselines

Example: Model Test
def test_model(self):
""" test that model performs better than constant baseline """
# generate sample data
dataset = self._sample_synthetic_dataset()
# train/valid split
train_data, valid_data = self._trainvalid_split(dataset)
# fit model
model = self._get_model()
model.fit(train_data)
model_valid_cost = self._get_valid_scores(model, valid_data)
# compare to dummy baseline model
cst_model = ConstantByUserModel()
cst_model.fit(train_data)
cst_model_valid_cost = self._get_valid_scores(cst_model, valid_data)
# test our model is better
assert model_valid_cost < cst_model_valid_cost

Versioning – Databases vs ML Data Objects
Database ML Data Objects
Persistent Ephemeral
Slow incremental updates Frequent drastic changes from scratch
Foreign keys and locks to prevent
corrupted data
Objects dependencies not expressed
programmatically

Versioning
ML Objects Identifiers
ML objects keep changing and depend on each other (e.g. datasets, models weights, items weights)
Store unique identifiers for all ML objects (data hash or unique id)
Keep track of identifiers of dependencies to prevent corrupted data
Versioning
Version ML objects to keep track of architecture changes
Match code version against data version
e.g. weights for 3-layers n-net must never be used in the code for 2-layers architecture

Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs

Orchestration
Manual Workflow
At first, execute your workflow manually or with CI/CD triggers
E.g. bash scripts, python scripts, Gitlab triggers, cron jobs
Dedicated Workflow Orchestrator and Jobs DAGs
When you have dozens of ML jobs that depend on each other, you will end up doing mistakes:
● making pipeline fail
● or worst: deploying corrupted data
Workflow Management System: Airflow, Luigi, Oozie, ...
WMS tutorial: http://bit.ly/2MUM0r9

Microservices and Kubernetes
Microservices
● All microservices must be safe to replicate/delete for scaling up/down to adjust for variable load
→ source of truth for data cannot be a microservice
→ has to be managed in the cloud

Microservices
● Fine-tune liveness probes (and readiness probes) to allow slow init of containers
[new in Kubernetes 1.16: startup probes]

Microservices
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication

Microservices
● Memory is expensive
→ split microservices in a way you can still share memory between workers and avoid duplication
● Microservices for RecSys require lots of memory and compute
→ favor a few big nodes more than many small ones

Micro-Services and Kubernetes
Model Split-Brain Problem
● No-downtime updates require blue/green or canary deployment
● Your architecture has to serve two models at once
● And you should not expect updating both the code and the data at the same time

items v1.0
model v1.0
Model
v1.0
users v1.0
redis S3

items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Update step 1/6

items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
redis S3
Model
v2.0
Update step 2/6

items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 3/6

items v1.0
model v1.0
items v2.0
model v2.0
Model
v1.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 4/6

items v1.0
model v1.0
items v2.0
model v2.0
users v1.0
users v2.0
redis S3
Model
v2.0
Update step 5/6

items v2.0
model v2.0
users v2.0
redis S3
Model
v2.0
Update step 6/6

ML Data Management – Plan
ML Data File Format
Cloud Storage
Tools

ML Data File Format
Cross-language
HDFS has great integration in python/pandas
Protobuf has great integration in Go, but very slow in python (as of today)
In Python: pickle+gzip
Can pickle numpy array eﬀiciently
Only uses standard library
Faster to read/write than HDFS
Alternative: high level library like sklearn’s joblib

Cloud Storage – Experiments Results
NoSQL ML Experiments Management Comparison
Comparison with code samples: http://bit.ly/2JtZg3M
Storage Language Scaling UI
MlFlow file-based Python, R, Java Small internal
DVC file-based Shell Small no
Sacred MongoDB Python Large external
TensorBoard file-based TF, PyTorch Medium great

Cloud Storage – Large ML Files
Static Storage for Saved Models and Datasets
File-based static storage
● either cloud (AWS S3, Google Storage, etc.)
● or NAS virtual file system
Limited metadata support (e.g. tags)
Requires strict file naming conventions

Tools
Not much 🤷‍♂
In Python
● Use numpy structured arrays http://bit.ly/2BP1hU3
● With Intel CPU, use intel-numpy-mkl https://dockr.ly/321u2Yp
● Use scipy sparse matrices http://bit.ly/36bJwMt
● Pandas
(but may unnecessary slow vs pure numpy)
● Stay tuned! github.com/Crossing-Minds

Real-Time Deployment – Conclusion
● Replicate items embeddings and model weights in memory
● Re-deploy to update (read-only for easy memory sharing)
● User embeddings can be managed in a cloud key-value store
● Proper identification, versioning, and naming convention for ML data files
● Manual workflow, then managed by DAGs of jobs

Real-Time Deployment – Conclusion

Recommender Systems from A to Z – Real-Time Deployment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Recommender Systems from A to Z – Real-Time Deployment

Similar to Recommender Systems from A to Z – Real-Time Deployment (20)

Recently uploaded

Recently uploaded (20)

Recommender Systems from A to Z – Real-Time Deployment