10 more lessons learned from building Machine Learning systems

91.413 visualizações

Publicada em

Presentation at #mlconf 2015 in San Francisco

6 comentários
478 gostaram
Estatísticas
Notas
  • For data visualization,data analytics,data intelligence and ERP Tools, online training with job placements, register at http://www.todaycourses.com
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Python Machine Learning --- http://amzn.to/1Zg8QHd
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies (MIT Press) --- http://amzn.to/1nZc3NI
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Machine Learning: The Art and Science of Algorithms that Make Sense of Data --- http://amzn.to/1R7qOtL
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Machine learning will make a great future for Biotechnology
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
Sem downloads
Visualizações
Visualizações totais
91.413
No SlideShare
0
A partir de incorporações
0
Número de incorporações
15.781
Ações
Compartilhamentos
0
Downloads
924
Comentários
6
Gostaram
478
Incorporações 0
Nenhuma incorporação

Nenhuma nota no slide

10 more lessons learned from building Machine Learning systems

  1. 10MoreLessons Learned from building real-life Machine Learning Systems Xavier Amatriain (@xamat) 10/13/2015
  2. Machine Learning @Quora
  3. Our Mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Thousands of topics ● ...
  4. Demand What we care about Quality Relevance
  5. Lots of data relations
  6. ML Applications @ Quora ● Answer ranking ● Feed ranking ● Topic recommendations ● User recommendations ● Email digest ● Ask2Answer ● Duplicate Questions ● Related Questions ● Spam/moderation ● Trending now ● ...
  7. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● (Deep) Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  8. 10MoreLessons Learned from implementing real-life ML systems
  9. 1.Implicitsignalsbeat explicitones (almostalways)
  10. Implicit vs. Explicit ● Many have acknowledged that implicit feedback is more useful ● Is implicit feedback really always more useful? ● If so, why?
  11. ● Implicit data is (usually): ○ More dense, and available for all users ○ Better representative of user behavior vs. user reflection ○ More related to final objective function ○ Better correlated with AB test results ● E.g. Rating vs watching Implicit vs. Explicit
  12. ● However ○ It is not always the case that direct implicit feedback correlates well with long-term retention ○ E.g. clickbait ● Solution: ○ Combine different forms of implicit + explicit to better represent long-term goal Implicit vs. Explicit
  13. 2.YourModelwilllearn whatyouteachittolearn
  14. Training a model ● Model will learn according to: ○ Training data (e.g. implicit and explicit) ○ Target function (e.g. probability of user reading an answer) ○ Metric (e.g. precision vs. recall) ● Example 1 (made up): ○ Optimize probability of a user going to the cinema to watch a movie and rate it “highly” by using purchase history and previous ratings. Use NDCG of the ranking as final metric using only movies rated 4 or higher as positives.
  15. Example 2 - Quora’s feed ● Training data = implicit + explicit ● Target function: Value of showing a story to a user ~ weighted sum of actions: v = ∑a va 1{ya = 1} ○ predict probabilities for each action, then compute expected value: v_pred = E[ V | x ] = ∑a va p(a | x) ● Metric: any ranking metric
  16. 3.Supervisedvs.plus UnsupervisedLearning
  17. Supervised/Unsupervised Learning ● Unsupervised learning as dimensionality reduction ● Unsupervised learning as feature engineering ● The “magic” behind combining unsupervised/supervised learning ○ E.g.1 clustering + knn ○ E.g.2 Matrix Factorization ■ MF can be interpreted as ● Unsupervised: ○ Dimensionality Reduction a la PCA ○ Clustering (e.g. NMF) ● Supervised ○ Labeled targets ~ regression
  18. Supervised/Unsupervised Learning ● One of the “tricks” in Deep Learning is how it combines unsupervised/supervised learning ○ E.g. Stacked Autoencoders ○ E.g. training of convolutional nets
  19. 4.Everythingisanensemble
  20. Ensembles ● Netflix Prize was won by an ensemble ○ Initially Bellkor was using GDBTs ○ BigChaos introduced ANN-based ensemble ● Most practical applications of ML run an ensemble ○ Why wouldn’t you? ○ At least as good as the best of your methods ○ Can add completely different approaches (e. g. CF and content-based) ○ You can use many different models at the ensemble layer: LR, GDBTs, RFs, ANNs...
  21. Ensembles & Feature Engineering ● Ensembles are the way to turn any model into a feature! ● E.g. Don’t know if the way to go is to use Factorization Machines, Tensor Factorization, or RNNs? ○ Treat each model as a “feature” ○ Feed them into an ensemble
  22. The Master Algorithm? It definitely is an ensemble!
  23. 5.Theoutputofyourmodel willbetheinputofanotherone (andotherdesignproblems)
  24. Outputs will be inputs ● Ensembles turn any model into a feature ○ That’s great! ○ That can be a mess! ● Make sure the output of your model is ready to accept data dependencies ○ E.g. can you easily change the distribution of the value without affecting all other models depending on it? ● Avoid feedback loops ● Can you treat your ML infrastructure as you would your software one?
  25. ML vs Software ● Can you treat your ML infrastructure as you would your software one? ○ Yes and No ● You should apply best Software Engineering practices (e.g. encapsulation, abstraction, cohesion, low coupling…) ● However, Design Patterns for Machine Learning software are not well known/documented
  26. 6.Thepains&gains ofFeatureEngineering
  27. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Reusability: You should be able to reuse features in different models, applications, and teams ● Transformability: Besides directly reusing a feature, it should be easy to use a transformation of it (e.g. log(f), max(f), ∑ft over a time window…)
  28. Feature Engineering ● Main properties of a well-behaved ML feature ○ Reusable ○ Transformable ○ Interpretable ○ Reliable ● Interpretability: In order to do any of the previous, you need to be able to understand the meaning of features and interpret their values. ● Reliability: It should be easy to monitor and detect bugs/issues in features
  29. Feature Engineering Example - Quora Answer Ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  30. Feature Engineering Example - Quora Answer Ranking How are those dimensions translated into features? • Features that relate to the answer quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  31. 7.Thetwofacesofyour MLinfrastructure
  32. Machine Learning Infrastructure ● Whenever you develop any ML infrastructure, you need to target two different modes: ○ Mode 1: ML experimentation ■ Flexibility ■ Easy-to-use ■ Reusability ○ Mode 2: ML production ■ All of the above + performance & scalability ● Ideally you want the two modes to be as similar as possible ● How to combine them?
  33. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionizing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  34. Machine Learning Infrastructure: Experimentation & Production ● Option 1: ○ Favor experimentation and only invest in productionazing once something shows results ○ E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work ● Option 2: ○ Favor production and have “researchers” struggle to figure out how to run experiments ○ E.g. Implement highly optimized C++ code and have ML researchers experiment only through data available in logs/DB
  35. ● Good intermediate options: ○ Have ML “researchers” experiment on iPython Notebooks using Python tools (scikit-learn, Theano…). Use same tools in production whenever possible, implement optimized versions only when needed. ○ Implement abstraction layers on top of optimized implementations so they can be accessed from regular/friendly experimentation tools Machine Learning Infrastructure: Experimentation & Production
  36. 8.Whyyoushouldcareabout answeringquestions(aboutyourmodel)
  37. Model debuggability ● Value of a model = value it brings to the product ● Product owners/stakeholders have expectations on the product ● It is important to answer questions to why did something fail ● Bridge gap between product design and ML algos ● Model debuggability is so important it can determine: ○ Particular model to use ○ Features to rely on ○ Implementation of tools
  38. Model debuggability ● E.g. Why am I seeing or not seeing this on my homepage feed?
  39. 9.Youdon’tneedtodistribute yourMLalgorithm
  40. Distributing ML ● Most of what people do in practice can fit into a multi- core machine ○ Smart data sampling ○ Offline schemes ○ Efficient parallel code ● Dangers of “easy” distributed approaches such as Hadoop/Spark ● Do you care about costs? How about latencies?
  41. Distributing ML ● Example of optimizing computations to fit them into one machine ○ Spark implementation: 6 hours, 15 machines ○ Developer time: 4 days ○ C++ implementation: 10 minutes, 1 machine ● Most practical applications of Big Data can fit into a (multicore) implementation
  42. 10.Theuntoldstoryof DataScienceandvs.MLengineering
  43. Data Scientists and ML Engineers ● We all know the definition of a Data Scientist ● Where do Data Scientists fit in an organization? ○ Many companies struggling with this ● Valuable to have strong DS who can bring value from the data ● Strong DS with solid engineering skills are unicorns and finding them is not scalable ○ DS need engineers to bring things to production ○ Engineers have enough on their plate to be willing to “productionize” cool DS projects
  44. The data-driven ML innovation funnel Data Research ML Exploration - Product Design AB Testing
  45. Data Scientists and ML Engineers ● Solution: ○ (1) Define different parts of the innovation funnel ■ Part 1. Data research & hypothesis building -> Data Science ■ Part 2. ML solution building & implementation -> ML Engineering ■ Part 3. Online experimentation, AB Testing analysis-> Data Science ○ (2) Broaden the definition of ML Engineers to include from coding experts with high-level ML knowledge to ML experts with good software skills Data Research ML Solution AB Testing Data Science Data Science ML Engineering
  46. Conclusions
  47. ● Make sure you teach your model what you want it to learn ● Ensembles and the combination of supervised/unsupervised techniques are key in many ML applications ● Important to focus on feature engineering ● Be thoughtful about ○ your ML infrastructure/tools ○ about organizing your teams

×