modul pembelajaran robotic Workshop _ by Slidesgo.pptx
More Data Science with Less Engineering: Machine Learning Infrastructure at Netflix
1. More Data Science with Less Engineering
ML Infrastructure at Netflix
SF BIG ANALYTICS
SEPTEMBER 2019
2. “
Architecture
The model always depends on the
previous day’s model, so it needs
to be processed for consecutive
days regardless when the
upstream data updates.
4. “
Data Access
“Any ideas why this query takes
20-30 mins to run on Presto, and
it seems to be similarly slow on
Spark too?
5. “
Scalability
“I’m going to start training a set of
thousands of models. I have many
countries, and I need to estimate a
hundred models per country.
We’re training 28 separate models
and for each model, we want to
run a hyperparameter search in
parallel.
6. Model Operations
“This shouldn’t be happening.
Interestingly, sounds like the
Friday run was successful even
though Thursday and Saturday
both failed.
My first guess is that we’re
increasingly hitting out of memory
errors as our member count rises.
8. Model Development
Feature Engineering
Model Operations
Versioning
Architecture
Job Scheduler
Compute Resources
Data Warehouse
How much
data scientist
cares
How much
infrastructure
is needed
18. Addressing real pain
points + fanatic user
support = ❤
“The team's bias towards
helpfulness in the slack channel
(even with questions that are
definitely user error) makes
Metaflow the best developer
product at Netflix.