O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Déjà Vu: The Importance of Time and Causality in Recommender Systems

8.711 visualizações

Publicada em

Talk at RecSys 2017 in Como, Italy on 2017-08-29.

Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.

Publicada em: Tecnologia
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Déjà Vu: The Importance of Time and Causality in Recommender Systems

  1. 1. Déjà Vu The Importance of Time and Causality in Recommender Systems Justin Basilico & Yves Raimond August 29, 2017 @JustinBasilico @moustaki
  2. 2. But first… Goodbye & Cinematch
  3. 3. But first… Goodbye & Cinematch % Match Hello &
  4. 4. But first… Goodbye Why? +200% ratings volume Clear link to personalization & Cinematch % Match Hello &
  5. 5. Image from Domiriel (cc by-nc)
  6. 6. ● This moment can be controlled by the user ○ Visit time ○ Session length ● … or influenced by the system ○ Notifications, emails ● And must choose an action ○ … that has consequences Recommendations are actions at a moment in time ● Time and causality are critical aspects in any recommender system ○ Data collection ○ Experiment design (offline & online) ○ Algorithm & objective design ○ System design
  7. 7. Time-aware data collection
  8. 8. Data collection Observed labels Training time Serving time Serving input data collected
  9. 9. Violation of the space-time continuum! Observed labels Training input data collected Training time Serving time Serving input data collected
  10. 10. Data collection Observed labels Training input data collected Training time Serving time Serving input data collected
  11. 11. Time machines Observed labels Training input data collected Training time Serving time Serving input data collected Distributed Time Travel for Feature Generation
  12. 12. ● Be careful when splitting dataset ○ Don’t overfit the past ○ Predict the future ● Rule of thumb: Split across what you need to generalize ○ Time! ○ Users or Items? ● May need to train/test at multiple distinct time points to see generalization across time (e.g. [Lathia et. al., 2009]) ● Simulate system behaviors (e.g. training and publishing delays) in evaluation pipeline ○ Helps capture trade-off between accuracy and responsiveness Experiment design Train Time Test
  13. 13. Time-aware recommendation algorithms
  14. 14. R ≈ UM
  15. 15. ? ? Users changing over time Nonstationarity
  16. 16. Items changing over time popularity time Learned item bias Actual item popularity Item launch Item becomes available
  17. 17. ● Aggregation ○ Decay functions (e.g. [Ding, Li, 2005]) ○ Buckets (e.g. [Zimdars, Chickering, Meek, 2001]) ● Extrapolation (e.g. [Koren, 2009]) ● Sequences ○ Markov (e.g. [Rendle, et al., 2010]) ○ Last N (e.g. [Shani, Heckerman, Brafman, 2005]) ○ RNNs (e.g. [Hidasi et al., 2015]) ● Features ○ Discretized (e.g. [Baltrunas, Amatriain, 2009]) ○ Continuous (e.g. example age in [Covington et. al., 2016]) Some modeling approaches time
  18. 18. ● Generalizing to future behaviors through temporal extrapolation ● Time exhibits many periodicities ○ Daily ○ Weekly ○ Seasonally ○ … and even longer: Olympics, elections, etc. ● Additional periodic time context features can be added or extracted Time as context Experiment on a Netflix internal dataset
  19. 19. ● Recommendation systems are a means to an end ○ Reward = enjoyment - interaction cost ○ Enjoyment integrated over time (e.g. goodness * length of view) ○ Interaction cost integrated over time ○ Don’t waste your users time ○ Magnitudes of enjoyment and cost may be user-specific ● Maximize enjoyment of the selected item while minimizing time it takes to find the item Minimizing interaction time
  20. 20. Hangul alphabet, 3 syllables but requires 7 (2 + 3 + 2) interactionsClick
  21. 21. With a model optimized to minimize interaction time: one interaction Click
  22. 22. Time-aware recommender systems
  23. 23. Algorithms changing Idea Offline experimentation Online experimentation (A/B) Rollout
  24. 24. Algorithm C Algorithm B Algorithm A Algorithms changing Idea Offline experimentation Online experimentation (A/B) Rollout
  25. 25. Algorithm C Algorithm B Algorithm A Algorithms changing Idea Offline experimentation Online experimentation (A/B) Rollout Assumes stationarity! A change in other parts of the system might invalidate previous (offline or online) results. Holdback A/B tests as part of rollout can help.
  26. 26. UX changing over time % Match&
  27. 27. Feedback loops Impression bias inflates plays Leads to inflated item popularity More plays More impressions Oscillations in distribution of genre recommendations Feedback loops can cause biases to be reinforced by the recommendation system!
  28. 28. Closed Loop Training Data Watches Model Recs
  29. 29. Closed Loop Training Data Watches Model Recs Danger Zone
  30. 30. Closed Loop Training Data Watches Model Recs Danger Zone Search Training Data Watches Model Recs Open Loop
  31. 31. Closed Loop Training Data Watches Model Recs Danger Zone Search Training Data Watches Model Recs Open Loop
  32. 32. Open vs. Closed Loops [Based on Steck, 2013 with system as selector] Watch when rec Probability of rec Watch when not rec Probability of not rec
  33. 33. Open vs. Closed Loops [Based on Steck, 2013 with system as selector] Watch when rec Probability of rec Watch when not rec Probability of not rec Closed loop: 0 Open loop: > 0
  34. 34. Open vs. Closed Loops [Based on Steck, 2013 with system as selector] Watch when rec Probability of rec Watch when not rec Probability of not rec Closed loop: 0 Open loop: > 0 We have control over this
  35. 35. ● Maintain some controlled exploration to break feedback loop and handle non-stationarities ● Explore with -greedy, Thompson Sampling, etc. ● Control to avoid significantly degrading user experience ● Log as much as possible ○ Include counterfactuals: What maximal action system wanted to do (e.g. [Bottou et al., 2013]) Controlled stochasticity Explore Explore
  36. 36. Replay Metrics Observed reward Existing recommendation algorithm (with stochasticity) Observed reward New recommendation algorithm [Li et al., 2011; Dudik, Langford, Li, 2014] Simulate online metrics, offline!
  37. 37. ● Stochasticity opens the door to using causal inference ● Inverse Propensity Weighting ○ Reduce production bias by reweighting train and test data ○ Know probability of user receiving an impression ○ Doesn’t handle simultaneity and other endogeneity ● Covariate shift ○ Use explore data to estimate bias in other data ○ Use all data to train ● Instrumental variables for more general settings Causality [Schnabel et al., 2016; Liang, Charlin, Blei, 2016; Smola, 2011, Sugiyama, Kawanabe, 2012]
  38. 38. ● Most recommendations (and ML) models are correlational ○ These items are correlated with these types of users ● But we seek causal actions ○ Showing this item is rewarding for this user ● Our recommendation action should have an incremental effect in reward: E[r(a)] - E[r(∅)] ○ Application-dependent choice of ∅ ○ Sometimes it may be better not to provide a recommendation that simply maximizes p(vi |u) ○ May provide less obvious recommendations Incrementality p(vi |∅) p(vi |a) Incremental effect
  39. 39. ● Gold standard of causality ○ Random assignment ○ Measured across time ○ Incremental benefit of treatment ● Causality safety net? ○ Hard to test with full feedback loop effects ○ An algorithm may behave differently when training off its own data ○ Holdback tests A/B Testing Time A (Control) B (Treatment) Significant? Metrics
  40. 40. Conclusions.
  41. 41. ● After users and items, time is usually the next most important factor in recommendation systems ○ Model it as such ○ Evaluate it as such ○ Make it central to your system and infrastructure ● Recommender systems act in a causal loop ○ Influenced by themselves and others ○ Be thoughtful about feedback effects Takeaways
  42. 42. Thank you. @JustinBasilico @moustaki Justin Basilico & Yves Raimond Yes, we’re hiring...
  43. 43. Déjà Vu The Importance of Time and Causality in Recommender Systems Justin Basilico & Yves Raimond August 29, 2017 @JustinBasilico @moustaki

×