O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 31 Anúncio

Scaling Recommendations at Quora (RecSys talk 9/16/2016)

Baixar para ler offline

Talk about scaling Quora's recommendations and ML systems given at the ACM RecSys conference at Boston during the Large Scale Recommendation Systems (LSRS) workshop.

Talk about scaling Quora's recommendations and ML systems given at the ACM RecSys conference at Boston during the Large Scale Recommendation Systems (LSRS) workshop.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (17)

Quem viu também gostou (14)

Anúncio

Semelhante a Scaling Recommendations at Quora (RecSys talk 9/16/2016) (20)

Mais recentes (20)

Anúncio

Scaling Recommendations at Quora (RecSys talk 9/16/2016)

  1. 1. Scaling Recommendations at Quora Nikhil Dandekar @nikhilbd 9/16/2016
  2. 2. Quora’s Mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Over a million topics ● Growing exponentially...
  3. 3. Lots of high-quality textual information
  4. 4. Lots of data relations
  5. 5. ● Scaling the home page feed ● Scaling the Machine Learning environment ● Pragmatism: aka don’t chase every new, shiny object Agenda
  6. 6. Scaling the Home Page Feed
  7. 7. Recommendations at Quora ● Home feed ● Digest emails ● Topics to follow ● Users to follow ● Related Questions ● Related Topics (topic → topic) ● Trending topics ● …..
  8. 8. Home feed ● Goal: personalized, engaging experience for reading/writing ● Show a ranked list of stories (questions/answers) ● ML model predicts an interestingness score for each story ● Training data: ○ impression logs from the past ○ x: features about user/story/interactions ○ y: score based on actions (answer/follow, upvote/click)
  9. 9. What is interestingness? click upvote downvote expand share click answer pass downvot e follow
  10. 10. Performance and Cost Millions of questions and answers The best 20 questions and answers Personalized Ranking x millions of users Scaling challenge: ● Content growing exponentially ○ Time spent per ranking request growing exponentially ● Users growing exponentially ○ Number of ranking requests growing exponentially ● Computational resources spent on ranking growing quadratically with respect to user growth
  11. 11. ● Solution: Multi-phase ranking! ● Use an unpersonalized model to reduce the number of candidates for the personalized model ● Cache the computed score in storage Performance and Cost Millions of questions and answers The best 20 questions and answers Ranking x millions of users Thousands of questions and answers Unpersonalized (1p) Personalized (2p)
  12. 12. Feed backend system Aggregator 1 Aggregator 2 Aggregator 3 Leaf 1 Leaf 2 Leaf 3 Aggregator Leaf Requests from Web (python) ... ... ... user_id object_id
  13. 13. Scaling the Machine Learning Environment
  14. 14. ML applications ● Feed / digest ● Search ● Answer ranking / Answer collapsing ● User-user, user-topic recommendations ● Related questions ● Duplicate questions ● Question-topics ● Question quality ● Spam users / content ● ….and a lot more Machine Learning environment ML Models ● Logistic Regression ● Gradient Boosted Decision Trees ● LambdaMART ● Random Forest ● Matrix Factorization ● Deep Neural Networks ● LDA ● k-means ● k-NNs ● ...and others
  15. 15. ● Productionizing ML training ○ Continuous retraining of models to adapt to new data ○ Use Luigi to keep track of task dependencies Machine Learning environment
  16. 16. ● Productionizing ML training: ○ Continuous retraining of models to adapt to new data ○ Use Luigi to keep track of task dependencies ● Use Amazon EC2 spot instance for training tasks ○ Usually much cheaper than on-demand price ○ Can spawn multiple boxes at once and shut them down after training is complete Machine Learning environment
  17. 17. ● Productionizing ML training: ○ Continuous retraining of models to adapt to new data ○ Use Luigi to keep track of task dependencies ● Use Amazon EC2 spot instance for training tasks ● Extremely important to have automatic monitoring of each task’s input/output ○ Data can change in unexpected ways ○ Don’t want bugs in upstream models to affect downstream models Machine Learning environment Data populator Training model 1 Training model 2 Training model 3
  18. 18. ● Productionizing ML training: ○ Continuous retraining of models to adapt to new data ○ Use Luigi to keep track of task dependencies ● Use Amazon EC2 spot instance for training tasks ● Extremely important to have automatic monitoring of each task’s input/output ○ Data can change in unexpected ways ○ Don’t want bugs in upstream models to affect downstream models Machine Learning environment Data populator Training model 1 Training model 2 Training model 3 Verify data Verify metrics Counts, class proportions,... MSE, R2, AUC,...
  19. 19. ● Need a ML platform that is ○ Easy to ramp up on ○ Easy to iterate on ○ Fast ○ Reliable ○ Reusable ○ Production-ready Machine Learning platform goals
  20. 20. ● Have a centralized ML platform that is shared across teams ○ Write training scripts in C++/Python and run them on remote boxes ○ Provide Python wrappers with iPython integration ○ Store data on Redshift/S3 and have training boxes communicate with them directly Machine Learning platform Dev laptop Storage services (Redshift, S3…) Training boxes CPU/GPU
  21. 21. ● In an IPython notebook Lego ML platform
  22. 22. Lego ML platform
  23. 23. ● Single way to define and add ML features ● Features are reusable ○ Different ML applications do not define / calculate them separately ● Available both offline (training time) and online (prediction time) ● Single point for logging, monitoring, documentation etc. Alchemy Feature Engineering Framework
  24. 24. Pragmatism
  25. 25. ● Relevance ● Speed: Fast prediction, (relatively) fast training ● Fast development and iteration time ● Reliability / Robustness ● Cost ● Debuggability ● Low technical debt What all matters for your ML algorithm:
  26. 26. Occam’s razor for Machine Learning ● Given two models that perform more or less equally, you should always prefer the less complex ● E.g. A Deep Learning model: ○ +1% in accuracy ○ 10x training time ○ 1.5x prediction time ○ Costly to store and maintain ● Look at all the factors, not just relevance
  27. 27. Distributing ML training ● Distributed ML training helps you scale with data ● But most of what people do in practice can fit into a single, multi-core machine ● Trade-offs: ○ Relevance gains ○ Training speed ○ Development and iteration time ○ Costs ● Use what works best given these factors, with an eye out for the future
  28. 28. ● Figure out how to scale up your data and your models ● But scaling is not just about data and the models ○ Think about your ML environment too ● Be Pragmatic ○ Don’t chase every new, shiny object In summary
  29. 29. ● https://www.quora.com/careers ● Technical Lead - Machine Learning ● Software Engineer - Machine Learning ● Software Engineer - NLP ● Engineering Manager - Machine Learning We are hiring!
  30. 30. Thanks!

×