10MoreLessons
Learned from building real-life Machine Learning Systems
Xavier Amatriain (@xamat) 10/13/2015
Machine Learning
@Quora
Our Mission
“To share and grow the world’s knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of...
Demand
What we care about
Quality
Relevance
Lots of data relations
ML Applications @ Quora
● Answer ranking
● Feed ranking
● Topic recommendations
● User recommendations
● Email digest
● As...
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● (Deep) Neural Networks
● ...
10MoreLessons
Learned from implementing real-life ML systems
1.Implicitsignalsbeat
explicitones
(almostalways)
Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more useful
● Is implicit feedback really always
...
● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user ...
● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickb...
2.YourModelwilllearn
whatyouteachittolearn
Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. pro...
Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a story to a
user ~ wei...
3.Supervisedvs.plus
UnsupervisedLearning
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature en...
Supervised/Unsupervised Learning
● One of the “tricks” in Deep Learning is how it
combines unsupervised/supervised learnin...
4.Everythingisanensemble
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensem...
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to ...
The Master Algorithm?
It definitely is an ensemble!
5.Theoutputofyourmodel
willbetheinputofanotherone
(andotherdesignproblems)
Outputs will be inputs
● Ensembles turn any model into a feature
○ That’s great!
○ That can be a mess!
● Make sure the out...
ML vs Software
● Can you treat your ML infrastructure as you would
your software one?
○ Yes and No
● You should apply best...
6.Thepains&gains
ofFeatureEngineering
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanati...
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that rela...
7.Thetwofacesofyour
MLinfrastructure
Machine Learning Infrastructure
● Whenever you develop any ML infrastructure, you need to
target two different modes:
○ Mo...
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in produ...
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in produ...
● Good intermediate options:
○ Have ML “researchers” experiment on iPython Notebooks using
Python tools (scikit-learn, The...
8.Whyyoushouldcareabout
answeringquestions(aboutyourmodel)
Model debuggability
● Value of a model = value it brings to the product
● Product owners/stakeholders have expectations on...
Model debuggability
● E.g. Why am I seeing or not seeing
this on my homepage feed?
9.Youdon’tneedtodistribute
yourMLalgorithm
Distributing ML
● Most of what people do in practice can fit into a multi-
core machine
○ Smart data sampling
○ Offline sc...
Distributing ML
● Example of optimizing computations to fit them into
one machine
○ Spark implementation: 6 hours, 15 mach...
10.Theuntoldstoryof
DataScienceandvs.MLengineering
Data Scientists and ML Engineers
● We all know the definition of a Data Scientist
● Where do Data Scientists fit in an org...
The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
Data Scientists and ML Engineers
● Solution:
○ (1) Define different parts of the innovation funnel
■ Part 1. Data research...
Conclusions
● Make sure you teach your model what you
want it to learn
● Ensembles and the combination of
supervised/unsupervised tech...
10 more lessons learned from building Machine Learning systems - MLConf
Próximos SlideShares
Carregando em…5
×

10 more lessons learned from building Machine Learning systems - MLConf

41.857 visualizações

Publicada em

Presentation delivered at MLConf SF 2015

Publicada em: Engenharia, Tecnologia
2 comentários
309 gostaram
Estatísticas
Notas
Sem downloads
Visualizações
Visualizações totais
41.857
No SlideShare
0
A partir de incorporações
0
Número de incorporações
158
Ações
Compartilhamentos
0
Downloads
238
Comentários
2
Gostaram
309
Incorporações 0
Nenhuma incorporação

Nenhuma nota no slide

10 more lessons learned from building Machine Learning systems - MLConf

  1. 10MoreLessons Learned from building real-life Machine Learning Systems Xavier Amatriain (@xamat) 10/13/2015
  2. Machine Learning @Quora
  3. Our Mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Thousands of topics ● ...
  4. Demand What we care about Quality Relevance
  5. Lots of data relations
  6. ML Applications @ Quora ● Answer ranking ● Feed ranking ● Topic recommendations ● User recommendations ● Email digest ● Ask2Answer ● Duplicate Questions ● Related Questions ● Spam/moderation ● Trending now ● ...
  7. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● (Deep) Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  8. 10MoreLessons Learned from implementing real-life ML systems
  9. 1.Implicitsignalsbeat explicitones (almostalways)
  10. Implicit vs. Explicit ● Many have acknowledged that implicit feedback is more useful ● Is implicit feedback really always more useful? ● If so, why?
  11. ● Implicit data is (usually): ○ More dense, and available for all users ○ Better representative of user behavior vs. user reflection ○ More related to final objective function ○ Better correlated with AB test results ● E.g. Rating vs watching Implicit vs. Explicit
  12. ● However ○ It is not always the case that direct implicit feedback correlates well with long-term retention ○ E.g. clickbait ● Solution: ○ Combine different forms of implicit + explicit to better represent long-term goal Implicit vs. Explicit
  13. 2.YourModelwilllearn whatyouteachittolearn
  14. Training a model ● Model will learn according to: ○ Training data (e.g. implicit and explicit) ○ Target function (e.g. probability of user reading an answer) ○ Metric (e.g. precision vs. recall) ● Example 1 (made up): ○ Optimize probability of a user going to the cinema to watch a movie and rate it “highly” by using purchase history and previous ratings. Use NDCG of the ranking as final metric using only movies rated 4 or higher as positives.
  15. Example 2 - Quora’s feed ● Training data = implicit + explicit ● Target function: Value of showing a story to a user ~ weighted sum of actions: v = ∑a va 1{ya = 1} ○ predict probabilities for each action, then compute expected value: v_pred = E[ V | x ] = ∑a va p(a | x) ● Metric: any ranking metric
  16. 3.Supervisedvs.plus UnsupervisedLearning
  17. Supervised/Unsupervised Learning ● Unsupervised learning as dimensionality reduction ● Unsupervised learning as feature engineering ● The “magic” behind combining unsupervised/supervised learning ○ E.g.1 clustering + knn ○ E.g.2 Matrix Factorization ■ MF can be interpreted as ● Unsupervised: ○ Dimensionality Reduction a la PCA ○ Clustering (e.g. NMF) ● Supervised ○ Labeled targets ~ regression
  18. Supervised/Unsupervised Learning ● One of the “tricks” in Deep Learning is how it combines unsupervised/supervised learning ○ E.g. Stacked Autoencoders ○ E.g. training of convolutional nets
  19. 4.Everythingisanensemble
  20. Ensembles ● Netflix Prize was won by an ensemble ○ Initially Bellkor was using GDBTs ○ BigChaos introduced ANN-based ensemble ● Most practical applications of ML run an ensemble ○ Why wouldn’t you? ○ At least as good as the best of your methods ○ Can add completely different approaches (e. g. CF and content-based) ○ You can use many different models at the ensemble layer: LR, GDBTs, RFs, ANNs...
  21. Ensembles & Feature Engineering ● Ensembles are the way to turn any model into a feature! ● E.g. Don’t know if the way to go is to use Factorization Machines, Tensor Factorization, or RNNs? ○ Treat each model as a “feature” ○ Feed them into an ensemble
  22. The Master Algorithm? It definitely is an ensemble!
  23. 5.Theoutputofyourmodel willbetheinputofanotherone (andotherdesignproblems)
  24. Outputs will be inputs ● Ensembles turn any model into a feature ○ That’s great! ○ That can be a mess! ● Make sure the output of your model is ready to accept data dependencies ○ E.g. can you easily change the distribution of the value without affecting all other models depending on it? ● Avoid feedback loops ● Can you treat your ML infrastructure as you would your software one?
  25. ML vs Software ● Can you treat your ML infrastructure as you would your software one? ○ Yes and No ● You should apply best Software Engineering practices (e.g. encapsulation, abstraction, cohesion, low coupling…) ● However, Design Patterns for Machine Learning software are not well known/documented
  26. 6.Thepains&gains ofFeatureEngineering
  27. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Reusability: You should be able to reuse features in different models, applications, and teams ● Transformability: Besides directly reusing a feature, it should be easy to use a transformation of it (e.g. log(f), max(f), ∑ft over a time window…)
  28. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Interpretability: In order to do any of the previous, you need to be able to understand the meaning of features and interpret their values. ● Reliability: It should be easy to monitor and detect bugs/issues in features
  29. Feature Engineering Example - Quora Answer Ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  30. Feature Engineering Example - Quora Answer Ranking How are those dimensions translated into features? • Features that relate to the answer quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  31. 7.Thetwofacesofyour MLinfrastructure
  32. Machine Learning Infrastructure ● Whenever you develop any ML infrastructure, you need to target two different modes: ○ Mode 1: ML experimentation ■ Flexibility ■ Easy-to-use ■ Reusability ○ Mode 2: ML production ■ All of the above + performance & scalability ● Ideally you want the two modes to be as similar as possible ● How to combine them?
  33. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionizing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  34. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionazing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  35. ● Good intermediate options: ○ Have ML “researchers” experiment on iPython Notebooks using Python tools (scikit-learn, Theano…). Use same tools in production whenever possible, implement optimized versions only when needed. ○ Implement abstraction layers on top of optimized implementations so they can be accessed from regular/friendly experimentation tools Machine Learning Infrastructure: Experimentation & Production
  36. 8.Whyyoushouldcareabout answeringquestions(aboutyourmodel)
  37. Model debuggability ● Value of a model = value it brings to the product ● Product owners/stakeholders have expectations on the product ● It is important to answer questions to why did something fail ● Bridge gap between product design and ML algos ● Model debuggability is so important it can determine: ○ Particular model to use ○ Features to rely on ○ Implementation of tools
  38. Model debuggability ● E.g. Why am I seeing or not seeing this on my homepage feed?
  39. 9.Youdon’tneedtodistribute yourMLalgorithm
  40. Distributing ML ● Most of what people do in practice can fit into a multi- core machine ○ Smart data sampling ○ Offline schemes ○ Efficient parallel code ● Dangers of “easy” distributed approaches such as Hadoop/Spark ● Do you care about costs? How about latencies?
  41. Distributing ML ● Example of optimizing computations to fit them into one machine ○ Spark implementation: 6 hours, 15 machines ○ Developer time: 4 days ○ C++ implementation: 10 minutes, 1 machine ● Most practical applications of Big Data can fit into a (multicore) implementation
  42. 10.Theuntoldstoryof DataScienceandvs.MLengineering
  43. Data Scientists and ML Engineers ● We all know the definition of a Data Scientist ● Where do Data Scientists fit in an organization? ○ Many companies struggling with this ● Valuable to have strong DS who can bring value from the data ● Strong DS with solid engineering skills are unicorns and finding them is not scalable ○ DS need engineers to bring things to production ○ Engineers have enough on their plate to be willing to “productionize” cool DS projects
  44. The data-driven ML innovation funnel Data Research ML Exploration - Product Design AB Testing
  45. Data Scientists and ML Engineers ● Solution: ○ (1) Define different parts of the innovation funnel ■ Part 1. Data research & hypothesis building -> Data Science ■ Part 2. ML solution building & implementation -> ML Engineering ■ Part 3. Online experimentation, AB Testing analysis-> Data Science ○ (2) Broaden the definition of ML Engineers to include from coding experts with high-level ML knowledge to ML experts with good software skills Data Research ML Solution AB Testing Data Science Data Science ML Engineering
  46. Conclusions
  47. ● Make sure you teach your model what you want it to learn ● Ensembles and the combination of supervised/unsupervised techniques are key in many ML applications ● Important to focus on feature engineering ● Be thoughtful about ○ your ML infrastructure/tools ○ about organizing your teams

×