From Idea to Production: BBC's Recommender Engine

From an idea to production
Recommender for BBC Sounds
Tatiana Al-Chueyr
Principal Data Engineer at Datalab
MLOps London, 28 September 2021 @tati_alchueyr

@tati_alchueyr
MLOps London 28 September 2021
3
https://ahaslides.com/MLOPS1

@tati_alchueyr
At what stage of MLOps adoption is your team?
4

@tati_alchueyr
What do you expect from this presentation?
5

@tati_alchueyr
BBC.
Vision
For the BBC to be a leader in Machine Learning that
delights audiences and prioritises the needs of
individuals and society over corporations and states.
Mission
To develop and deploy Machine Learning at BBC scale
so that teams can tailor services to individuals whilst
upholding our editorial values.
7

@tati_alchueyr
BBC. . .hummingbirds
8
The knowledge in this presentation is the result of lots of teamwork
within one squad of a larger team and even broader organisation
current squad team members
previous squad team members
Darren
Mundy
David
Hollands
Richard
Bownes
Marc
Oppenheimer
Bettina
Hermant
Tatiana
Al-Chueyr
Jana
Eggink

recommendation engine
the challenge

@tati_alchueyr
The BBC outsourced a recommendation engine
10

@tati_alchueyr
The audience liked personalised recommendations
11

@tati_alchueyr
Could we replace it with our own recommender?
12

@tati_alchueyr
Could we replace it with our own recommender?
13

algorithms

@tati_alchueyr
Content-based approach
15

@tati_alchueyr
Collaborative-filtering approach
16

@tati_alchueyr
Hybrid approach e.g. Factorisation Machine Algorithm
17

@tati_alchueyr
Machine learning workflow
18

the prototype

@tati_alchueyr
1-2 months of work:
● Collected data (quick-and-dirty™ scripts)
● Compared existing Python Factorisation Machines libraries (winner: LightFM)
● Trained and predicted recommendations (quick-and-dirty™ scripts)
● Implemented a qualitative experiment tool
● Recruited volunteers to join the qualitative experiment
● Ran qualitative experiment, comparing:
○ External provider recommendations
○ Our own Factorization Machines-powered recommendations
The prototype
20

@tati_alchueyr
Qualitative experiment: how
Who
● ~30 test users recruited
○ Internal BBC employees
○ Under 35
How
● Two sets with 9 recommendations each:
○ External provider
○ Internal factorisation machines
● Users, without knowing the origin of the recs, had to:
○ choose “the best”, “both”, or “neither”
○ explain why
21

@tati_alchueyr
Qualitative experiments
neither external
provider
factorisation
machines
both
22

productionising

@tati_alchueyr
Productionising machine learning
Configuration
Data Collection
and
Transformation
Feature Extraction
Data
Verification
Machine
Resource
Management
Serving
Infrastructure
Monitoring
Process Management
Tools
Analysis Tools
ML Code
Image copied from presentation by Googler @mpyeager
24

@tati_alchueyr
Tech stack
● Google Cloud Platform
● Python used as our main coding language
○ LightFM library used for the model: very handy dataset class and multi-threading processing
● Apache Airflow (Composer) for the workflows orchestration
● Apache Beam (Dataflow) for the parallelizable data processing
● Redis to store our pre-computed recommendations
● Kubernetes for running the API and CPU-intensive tasks
● Terraform for the infrastructure as code development
25

@tati_alchueyr
Input
Processing
Output
User activity data Content metadata
Recommendations
Machine Learning model
training
Predict recommendations
26

@tati_alchueyr
Input
Processing
Output
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Speciﬁc language (if not consumed previously)
- Episode picking from a series
- Diversiﬁcation (1 episode per brand/series)
Recommendations
training
27

@tati_alchueyr
Steps to be done in the workflows, before the API
Input
Processing
Output
- Recency
- Availability
- Excluded genres
Recommendations
training
28

@tati_alchueyr
model
Recommendation API strategies
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations
29

@tati_alchueyr
model
Recommendation API strategies
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
Goal:
1500 requests/s with
P95 responses < 60
ms
30

@tati_alchueyr
Recommendation API: load performance
On the fly Precomputed Precomputed
Concurrent load tests
requests/s
50 50 1500
Success percentage 63.88% 100% 100%
Latency of p50 (success) 323.78 ms 1.68 ms 4.75 ms
Maximum successful
requests per second
23 50 1500
Goal:
1500 requests/s with
P95 responses < 60
ms
Machine type: c2-standard-8, Python 3.7, Sanic workers: 7, Prediction threads: 1, vCPU cores: 7, Memory: 15 Gi, Deployment Replicas: 1
31

@tati_alchueyr
model
Strategies to serve recommendations
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
32

@tati_alchueyr
Steps to be done in the workflows, before the API
Input
Processing
Output
- Recency
- Availability
- Excluded genres
Precomputed
recommendations
training
33

workflows orchestration

@tati_alchueyr
Workflows orchestration: requirements
● Scheduling recurrent jobs
● Retry executing a task if it fails
● Task dependency management
● Monitoring and logs
● Capability of programmatically defining workflows (direct acyclic graphs)
● Built-in support for writing automated tests
35

@tati_alchueyr
Workflows orchestration: Apache Airflow
36

@tati_alchueyr
Google Managed Apache Airflow: Cloud Composer
37

@tati_alchueyr
Cloud Composer: monitoring
38

@tati_alchueyr
Limitation of Apache Airflow
● Good for orchestrating tasks
● Not good for processing data within an Airﬂow worker
○ Separation of concern with runtime
39

@tati_alchueyr
40

@tati_alchueyr
41

@tati_alchueyr
Issue:
Depending on the
volumes of data, a single
PythonOperator task
which usually takes
10 min could take almost
3h!
Consequences:
Overall delay
Blocked worker
42

@tati_alchueyr
Time estimations (in seconds) to predict recommendations using a c2-standard-30 instance (30 vCPU and 120 GB RAM)
43

@tati_alchueyr
Time estimations (in seconds) to predict recommendations using a c2-standard-30 instance (30 vCPU and 120 GB RAM)
2h to predict
recommendations for
10k users
What about 5 million
users - or more?
44

@tati_alchueyr
Limitation of Apache Airflow: solutions
Delegating processing to other services
● Tasks which scale vertically (better hardware)
○ Airflow Compute Engine (Virtual Machine) Operator (GceInstanceStartOperator)
○ Airflow Kubernetes Pod Operator (GKEPodOperator)
● Tasks which scale horizontally (can be split and distributed in multiple nodes)
○ Airflow Dataflow Operator (Google Dataflow, Apache Beam )
○ Airflow Dataproc Operator (Google Dataproc, Apache Spark & Hadoop)
45

efﬁcient data processing

@tati_alchueyr
Apache Beam
“Apache Beam is a unified
programming model designed
to provide efficient and
portable data processing
pipelines”
47

@tati_alchueyr
Apache Beam
https://towardsdatascience.com/running-an-apache-beam-data-pipeline-on-azure-databricks-c09e521d8fc3
48

@tati_alchueyr
Apache Beam: overview of Dataflow job
Image from the book “Google Cloud Platform In Action” by JJ Geewax, Chapter 20
49

@tati_alchueyr
Parallel processing “effortlessly”
Image from the book “Google Cloud Platform In Action” by JJ Geewax, Chapter 20
50

@tati_alchueyr
Simple Beam example
https://beam.apache.org/documentation/transforms/python/aggregation/cogroupbykey/
51

@tati_alchueyr
52

@tati_alchueyr
Adoption of Apache Beam & Dataflow
“Serverless” parallel processing of 41,258,135 items (27.32 GB) with
Python in 1min 24s using 10 default workers
53

@tati_alchueyr
Pure Airflow
PythonOperator in
Cloud Composer
DataflowOperator
running a Beam
pipeline within
Dataflow
episode
availability episode
s/PythonOperator/DataflowOperator
Computation time reduced almost by one
order of magnitude
Document
type
PythonOperator DataflowOperator Performance
gain
episode 60 min 6 min 90%
availability
episode
12 min 5 min 58%
54

@tati_alchueyr
Precomputing recs for millions of users
55

@tati_alchueyr
Cost analysis
https://cloud.google.com/dataflow/pricing
Resources metrics per job
56

@tati_alchueyr
Cost reduction
300$ per run
57

@tati_alchueyr
Cost reduction
memory intensive
transformation
Solutions
● Use shared memory
● Split pipelines so only the memory
intensive transformation uses expensive
machine types.
58

@tati_alchueyr
Cost reduction
memory intensive
transformation
Solutions
● Use shared memory
● Split pipelines so only the memory
intensive transformation uses expensive
machine types.
59

@tati_alchueyr
Overall architecture
61
Availability, metadata and
episode data to support a
snapshot view of sounds
content.
DATA
SOURCE
S
Sounds User activity data for
content consumption. Signed in
only
DATA
PLATFORM
Dedicated stream of UAS
(encrypted) into a Sounds Data
Lake
ML PLATFORM
Dedicated Sounds Metadata
Snapshot built intra-day
User Activity features processed
intra-day
Both sources merged into a feature
set to support the re-training of the
Sounds recommender
RECOMMENDER
Serving layer that exposes the
recommender to Unirecs.
Unirecs serves
recommendations to end
users
Build Cluster for intra-day
build/training of the model.
Dedicated service for
pre-computing
recommendations

@tati_alchueyr
● +59% increase in interactions in Recommended for You rail
● +103% increase in interactions for under 35s
Initial A/B Tests results
63

@tati_alchueyr
API Latency
64
https://dashboard.sre.tools.bbc.co.uk/d/-RO7WwbZk/unirecs-live?viewPanel=63&orgId=6&from=now-7d&to=now&refresh=1m

@tati_alchueyr
Choosing the level of abstraction
66
None
Cloud built-in
proprietary tools
Open source / Proprietary
ready-to-use tools (eg. tfx)
Build custom workflows and train your own
models using Keras, Sklearn, PyTorch,
others
Customisation x Easiness of adoption

@tati_alchueyr
Thank you very much!

From Idea to Production: BBC's Recommender Engine

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a From Idea to Production: BBC's Recommender Engine

Semelhante a From Idea to Production: BBC's Recommender Engine (20)

Mais de Tatiana Al-Chueyr

Mais de Tatiana Al-Chueyr (20)

Último

Último (20)

From Idea to Production: BBC's Recommender Engine