O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Xavier Amatriain, VP of Engineering, Quora at MLconf SEA - 5/01/15

1.488 visualizações

Publicada em

Machine learning applications for growing the world’s knowledge at Quora: At Quora our mission is to “share and grow the world’s knowledge”. We want to do this by getting the right questions to the right people to answer them, but also by getting the existing answers to people who are interested in them. In order to accomplish this we need to build a complex ecosystem where we value issues such as content quality, engagement, demand, interests, or reputation. It is not possible to build a system like this unless most of the process are highly automated and scalable. We are fortunate though to have lots of very good quality data on which to build machine learning solutions that can help address all of the previous requirements.

In this talk I will describe some interesting uses of machine learning at Quora that range from different recommendation approaches such as personalized ranking to classifiers built to detect duplicate questions or spam. I will describe some of the modeling and feature engineering approaches that go into building these systems. I will also share some of the challenges faced when building such a large-scale knowledge base of human-generated knowledge.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Xavier Amatriain, VP of Engineering, Quora at MLconf SEA - 5/01/15

  1. 1. ML @ Quora ML Algorithms for Growing the World’s Knowledge Seattle, 05/01/2015Xavier Amatriain (@xamat)
  2. 2. About Quora
  3. 3. Our Mission “To share and grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  4. 4. Lots of data relations
  5. 5. Complex network propagation effects
  6. 6. Importance of topics & semantics
  7. 7. Demand What we care about Quality Relevance
  8. 8. Machine Learning @Quora
  9. 9. Ranking - Answer ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  10. 10. Ranking - Answer ranking How are those dimensions translated into features? • Features that relate to the text quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  11. 11. Ranking - Feed • Personalized learning-to-rank approach • Goal: Present most interesting stories for a user at a given time • Interesting = topical relevance + social relevance + timeliness • Stories = questions + answers
  12. 12. Ranking - Feed • Features • Quality of question/answer • Topics the user is interested on/ knows about • Users the user is following • What is trending/popular • … • Different temporal windows • Multi-stage solution with different “streams”
  13. 13. Recommendations - Topics Goal: Recommend new topics for the user to follow • Based on • Other topics followed • Users followed • User interactions • Topic-related features • ...
  14. 14. Recommendations - Users Goal: Recommend new users to follow • Based on: • Other users followed • Topics followed • User interactions • User-related features • ...
  15. 15. Related Questions • Given interest in question A (source) what other questions will be interesting? • Not only about similarity, but also “interestingness” • Features such as: • Textual • Co-visit • Topics • … • Important for logged-out use case
  16. 16. Duplicate Questions • Important issue for Quora • Want to make sure we don’t disperse knowledge to the same question • Solution: binary classifier trained with labelled data • Features • Textual vector space models • Usage-based features • ...
  17. 17. User Trust/Expertise Inference Goal: Infer user’s trustworthiness in relation to a given topic • We take into account: • Answers written on topic • Upvotes/downvotes received • Endorsements • ... • Trust/expertise propagates through the network • Must be taken into account by other algorithms
  18. 18. Trending Topics Goal: Highlight current events that are interesting for the user • We take into account: • Global “Trendiness” • Social “Trendiness” • User’s interest • ... • Trending topics are a great discovery mechanism
  19. 19. Spam Detection/Moderation • Very important for Quora to keep quality of content • Pure manual approaches do not scale • Hard to get algorithms 100% right • ML algorithms detect content/user issues • Output of the algorithms feed manually curated moderation queues
  20. 20. Content Creation Prediction • Quora’s algorithms not only optimize for probability of reading • Important to predict probability of a user answering a question • Parts of our system completely rely on that prediction • E.g. A2A (ask to answer) suggestions
  21. 21. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  22. 22. Conclusions
  23. 23. Conclusions • At Quora we have not only Big, but also “rich” data • Our algorithms need to understand and optimize complex aspects such as quality, interestingness, or user expertise • We believe ML will be one of the keys to our success • We have many interesting problems, and many unsolved challenges
  24. 24. We’re Hiring! http://www.quora.com/careers/