Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar

Kedar Sadekar, Netflix
Nitin Sharma, Netflix
Fact Store - Netflix
Recommendations
#DevSAIS11

Agenda
●
●
●
●
●
●
#DevSAIS11

#DevSAIS11
Recommendations at Netflix
● Personalized Homepage for each member
○ Goal: Quickly help members find content they’d like to
watch
○ Risk: Member may lose interest and abandon the service
○ Challenge: Recommendations at Scale

#DevSAIS11
Scale @ Netflix
●
●
●
●

#DevSAIS11
Experimentation Cycle @ Netflix
Design a New Experiment to Test Out Different Ideas
Offline
Experiment
Online
System
Design Experiment Model Testing
Collect Label Data
Offline Feature
Generation
Model Training Model Validation Metrics
Online A/B Testing

#DevSAIS11
ML Feature Engineering - Architectural View
Online Feature
Generation
Microservices
Online Scoring
Offline Feature
Generation
Shared Feature
Encoders
Model
Training
Deploy Models
Online SystemOffline Experiment
Features
Facts

#DevSAIS11
What is a Fact?
● Fact
○ Input data for feature encoders. Used to construct a feature
○ Example: Viewing history of member, my list of a member
● Historical Version of a fact
○ Rewindable - State of the world at that time
● Temporal
■ Facts are temporal i.e. they change with time
■ Each online scoring service uses the latest value of a fact

#DevSAIS11
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Recommendations Recommendations
Feature Logging Fact Logging

#DevSAIS11
Fact Logging - Pull Architecture
Pull
● Daily snapshots of key facts
● Storage
○ S3 & Parquet
● Api to access the data
○ RDD & DataFrames
● Cons
○ Lacks temporal accuracy
○ Load on Microservices
○ Missing Experiment specific facts
Capture
Snapshots
Fact Microservices
Stratified
Member sets
Snapshots

#DevSAIS11
Fact Logging - Push Architecture
● Compute engines
themselves control
what to log
● Stratification
● Temporal accuracy
Compute Services
Fact Store
Fact Transformer
Fact Fetcher
Fact Logger
ML Workflows
Feature
Generation
Model
Training

#DevSAIS11
Fact Logger
Precompute Live Compute
Fact
Logger
● Library
● Facts
○ User Related
○ Video Related
○ Computation Specific
● Serialization
● Stratification Service
● Fact Stream
● Storage
Base Fact Tables
Stratification

#DevSAIS11
Fact Logging - Scalability
Fact Logger
● 5-10x increase in data through
Kafka
● SLA Impact; Cost Increase
● Compression - 70% decrease

Storage & Access
Fact Store
Fact Transformer
Deduplication
Fact Logger
● Pipeline load
○ Repeated facts
● Aggressive or not
○ Loss threshold
Conditional push
● Spark Job
○ Fact pointers
○ SLA
#DevSAIS11

#DevSAIS11
API Lookback
Member ID My List View History Thumbs
122312 My List Value View History Pointer Thumbs Value
254637 My List Pointer View History Pointer Thumbs Pointer
Member n My List Pointer View History Value Thumbs Pointer
My List
Partition 1
Values
Partition m
Values
View
History
Partition 1
Values
Partition m
Values
Thumbs
Partition 1
Values
Partition m
Values
log_time - x
log_time - y
log_time - z

Storage & Access
Fact Store
Fact Transformer
Read API
Fact Logger
● Query performance
○ Slow moving facts
● Point query
○ Connector
● Query time reduction
○ Hours to minutes
Deduplication
Conditional push
Write
Read
#DevSAIS11

Performance: Storage
• Partitioning scheme
– Noisy neighbor
• Storage format
– Exploratory vs production
• Fast & Slow lane
– Lookback limit
#DevSAIS11

Performance: Spark reads
• Bloom Filters
– Reduce scan
• Cache Access
– EVCache, Spectator
• MapPartitions vs UDF
– Eager vs Lazy
– SPARK-11438, SPARK-11469,
SPARK-20586
Application
ML Library
Read API
#DevSAIS11

Future Work
• Structured with schema evolution
– Best of both (POJO & Spark SQL), Iceberg
• Streaming vs Batch
– Multiple lanes, accountability, independent scale
• Duplication
– Storage vs Runtime cost
#DevSAIS11

Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar

Semelhante a Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar (20)

Mais de Databricks

Mais de Databricks (20)

Último

Último (20)

Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar