O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.

2.351 visualizações

Publicada em

Better data beats better algorithms, but better data can be hard to come by. In this talk, Vitaly Gordon, Senior Data Scientist at LinkedIn, and Patrick Philips, Crowdsourcing Expert at LinkedIn, will show how the LinkedIn data science team hacks data science using sophisticated data mining and crowdsourcing techniques to leverage the data they already have and create the data that's missing.

  • Seja o primeiro a comentar

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.

  1. 1. Hacking Data Science
  2. 2. Overview of ML pipeline Gather data Feature engineering Model fitting Evaluation ©2013 LinkedIn Corporation. All Rights Reserved. 2
  3. 3. Understanding Seniority ©2013 LinkedIn Corporation. All Rights Reserved. 3
  4. 4. ©2013 LinkedIn Corporation. All Rights Reserved. 4 Companies are not standard
  5. 5. ©2013 LinkedIn Corporation. All Rights Reserved. 5 Titles are not enough
  6. 6. ©2013 LinkedIn Corporation. All Rights Reserved. 6 Things change
  7. 7. Learning to target better ©2013 LinkedIn Corporation. All Rights Reserved. 7
  8. 8. Classifying names to genders ©2013 LinkedIn Corporation. All Rights Reserved. 8
  9. 9. Let’s look at Monica again ©2013 LinkedIn Corporation. All Rights Reserved. 9
  10. 10. Not so fast … ©2013 LinkedIn Corporation. All Rights Reserved. 10
  11. 11. Not so fast … ©2013 LinkedIn Corporation. All Rights Reserved. 11
  12. 12. Even slower … ©2013 LinkedIn Corporation. All Rights Reserved. 12
  13. 13. Sometime the answer is just under your nose ©2013 LinkedIn Corporation. All Rights Reserved. 13
  14. 14. Comment Spam on Influencer content ©2013 LinkedIn Corporation. All Rights Reserved. 14
  15. 15. Challenge 1: Binary tasks are too guessable ©2013 LinkedIn Corporation. All Rights Reserved. 15
  16. 16. Challenge 2: Context matters ©2013 LinkedIn Corporation. All Rights Reserved. 16
  17. 17. Spam Comment Annotation Task ©2013 LinkedIn Corporation. All Rights Reserved. 17
  18. 18. Quality: Gold distributions and skewed datasets ©2013 LinkedIn Corporation. All Rights Reserved. 18
  19. 19. Using results to evaluate new features ©2013 LinkedIn Corporation. All Rights Reserved. 19 Model ΔP ΔR ΔPRC Baseline - - - Variation 1 + - + Variation 2 - + + Variation 3 - ++ - - Variation 4 - +++ ++ Variation 5 - +++ ++ Variation 6 - +++ ++ Variation 7 - ++++ +++ Variation 8 - ++++ +++ Variation 9 - ++++ +++ Variation 10 - ++++ +++
  20. 20. “As simple as possible, but not simpler” ©2013 LinkedIn Corporation. All Rights Reserved. 20
  21. 21. Linkedin Channels ©2013 LinkedIn Corporation. All Rights Reserved. 21
  22. 22. Labels aren’t free ©2013 LinkedIn Corporation. All Rights Reserved. 22
  23. 23. Suggest likely candidates for topics then expand ©2013 LinkedIn Corporation. All Rights Reserved. 23
  24. 24. Evaluate suggested article-topic pairs  Using results to evaluate new implementations of spam classifier – Improve Prec without drop in Rec  18k comments labeled in 54 hrs for $180 ©2013 LinkedIn Corporation. All Rights Reserved. 24
  25. 25. Quality: Not by Gold alone ©2013 LinkedIn Corporation. All Rights Reserved. 25
  26. 26. Using results to evaluate existing classification framework ©2013 LinkedIn Corporation. All Rights Reserved. 26
  27. 27. “Help your helpers” ©2013 LinkedIn Corporation. All Rights Reserved. 27
  28. 28. Search is a major portal to information ©2013 LinkedIn Corporation. All Rights Reserved. 28
  29. 29. LI Search is personalized ©2013 LinkedIn Corporation. All Rights Reserved. 29
  30. 30. Evaluation is still possible ©2013 LinkedIn Corporation. All Rights Reserved. 30
  31. 31. Search Evaluation – WTF@1 ©2013 LinkedIn Corporation. All Rights Reserved. 31
  32. 32. Quality: Behavioral metrics are good too! ©2013 LinkedIn Corporation. All Rights Reserved. 32
  33. 33. “Pick a solvable problem” ©2013 LinkedIn Corporation. All Rights Reserved. 33
  34. 34. Standardizing titles ©2013 LinkedIn Corporation. All Rights Reserved. 34
  35. 35. ©2013 LinkedIn Corporation. All Rights Reserved. 35
  36. 36. Which question is easier ©2013 LinkedIn Corporation. All Rights Reserved. 36 1. Find a better name for the title “account executive”? 2. How similar are “account executive” and “sales executive”?
  37. 37. ©2013 LinkedIn Corporation. All Rights Reserved. 37
  38. 38. Notable Experts ©2013 LinkedIn Corporation. All Rights Reserved. 38
  39. 39. First attempt ©2013 LinkedIn Corporation. All Rights Reserved. 39
  40. 40. Second attempt ©2013 LinkedIn Corporation. All Rights Reserved. 40
  41. 41. Third attempt ©2013 LinkedIn Corporation. All Rights Reserved. 41
  42. 42. What makes the best data mining expert?  Education?  Industry experience?  Amount of publications?  Communication skills?  Hacking skills?  Knowledge of statistics?  Number of endorsements? ©2013 LinkedIn Corporation. All Rights Reserved. 42
  43. 43. “More bad data != better data” ©2013 LinkedIn Corporation. All Rights Reserved. 43
  44. 44. Summary ©2013 LinkedIn Corporation. All Rights Reserved. 44 1. Use the data you already have 2. Keep it simple, but not too simple 3. Pick a solvable problem 4. Help your helpers 5. Sample intelligently 6. More (bad) data != better data
  45. 45. ©2013 LinkedIn Corporation. All Rights Reserved. 45 Questions?

×