O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
10MoreLessons
Learned from building real-life Machine Learning Systems
Xavier Amatriain (@xamat) 10/13/2015
Machine Learning
@Quora
Our Mission
“To share and grow the world’s knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of...
Demand
What we care about
Quality
Relevance
Lots of data relations
ML Applications @ Quora
● Answer ranking
● Feed ranking
● Topic recommendations
● User recommendations
● Email digest
● As...
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● (Deep) Neural Networks
● ...
10MoreLessons
Learned from implementing real-life ML systems
1.Implicitsignalsbeat
explicitones
(almostalways)
Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more useful
● Is implicit feedback really always
...
● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user ...
● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickb...
2.YourModelwilllearn
whatyouteachittolearn
Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. pro...
Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a story to a
user ~ wei...
3.Supervisedvs.plus
UnsupervisedLearning
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature en...
Supervised/Unsupervised Learning
● One of the “tricks” in Deep Learning is how it
combines unsupervised/supervised learnin...
4.Everythingisanensemble
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensem...
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to ...
The Master Algorithm?
It definitely is an ensemble!
5.Theoutputofyourmodel
willbetheinputofanotherone
(andotherdesignproblems)
Outputs will be inputs
● Ensembles turn any model into a feature
○ That’s great!
○ That can be a mess!
● Make sure the out...
ML vs Software
● Can you treat your ML infrastructure as you would
your software one?
○ Yes and No
● You should apply best...
6.Thepains&gains
ofFeatureEngineering
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
●...
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanati...
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that rela...
7.Thetwofacesofyour
MLinfrastructure
Machine Learning Infrastructure
● Whenever you develop any ML infrastructure, you need to
target two different modes:
○ Mo...
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in produ...
Machine Learning Infrastructure: Experimentation & Production
● Option 1:
○ Favor experimentation and only invest in produ...
● Good intermediate options:
○ Have ML “researchers” experiment on iPython Notebooks using
Python tools (scikit-learn, The...
8.Whyyoushouldcareabout
answeringquestions(aboutyourmodel)
Model debuggability
● Value of a model = value it brings to the product
● Product owners/stakeholders have expectations on...
Model debuggability
● E.g. Why am I seeing or not seeing
this on my homepage feed?
9.Youdon’tneedtodistribute
yourMLalgorithm
Distributing ML
● Most of what people do in practice can fit into a multi-
core machine
○ Smart data sampling
○ Offline sc...
Distributing ML
● Example of optimizing computations to fit them into
one machine
○ Spark implementation: 6 hours, 15 mach...
10.Theuntoldstoryof
DataScienceandvs.MLengineering
Data Scientists and ML Engineers
● We all know the definition of a Data Scientist
● Where do Data Scientists fit in an org...
The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
Data Scientists and ML Engineers
● Solution:
○ (1) Define different parts of the innovation funnel
■ Part 1. Data research...
Conclusions
● Make sure you teach your model what you
want it to learn
● Ensembles and the combination of
supervised/unsupervised tech...
Models ● Logistic Regression ●
Próximos SlideShares
Carregando em…5
×
174.798 visualizações

Publicada em

Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● (Deep) Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...

Publicada em: Engenharia, Tecnologia
  • Download The Complete Lean Belly Breakthrough Program with Special Discount. ♣♣♣ https://tinyurl.com/y6qaaou7
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Doctor's 2-Minute Ritual For Shocking Daily Belly Fat Loss! Watch This Video ➤➤ https://tinyurl.com/bkfitness4u
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Download The Complete Lean Belly Breakthrough Program with Special Discount. ♥♥♥ https://tinyurl.com/bkfitness4u
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Profollica�'s all-natural formula helped 90% of men reduce hair loss in a clinical trial. ★★★ http://t.cn/AiHip2fH
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Memory Improvement: How To Improve Your Memory In Just 30 Days, click here.. ■■■ https://tinyurl.com/brainpill101
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

×