O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Matt gershoff

386 visualizações

Publicada em

#EMSNYCDAY2

Publicada em: Marketing
  • Seja o primeiro a comentar

Matt gershoff

  1. 1. AB TESTING TO AI (REINFORCEMENT LEARNING)
  2. 2. WHO IS THIS GUY? • Matt Gershoff • CEO: Conductrics • Twitter:mgershoff • Email:matt@conductrics.com
  3. 3. AI is …?
  4. 4. WHAT WE WILL TALK ABOUT • Definition of Reinforcement Learning –Trial and Error Learning •AB Testing (Bayesian) •Multi-Armed Bandit – (Automation) •Bandit with Targeting –Multi-Touch Point Optimization •Attribution=Dynamics •Q-Learning
  5. 5. What is Reinforcement Learning?
  6. 6. Reinforcement Learning is a Problem not a Solution
  7. 7. Reinforcement Learning Problem: Learn to make a Sequence of Decisions by Trial & Error in order to Achieve (delayed) Goal(s)
  8. 8. EXAMPLE
  9. 9. MARKETING PROBLEMS Online Applications – websites, mobile, things communicating via HTTP Low Risk Decisions* – i.e. ‘Which Banner’ High Volume* – not for one off, or for decisions that are made infrequently * High Volume/Low Risk from here http://jtonedm.com/
  10. 10. TRIAL AND ERROR LEARNING AB Testing/Bandit Sequential Decisions Targeting
  11. 11. A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  12. 12. How to Solve: A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  13. 13. How to Solve: 1. AB Testing A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  14. 14. AB Testing: Bayesian Red Button Green Button
  15. 15. Bayesian AB Test asks: AB Testing: Bayesian
  16. 16. Bayesian AB Test asks: AB Testing: Bayesian Is P( Green|DATA)> P(Red|DATA)?
  17. 17. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|DATA)=50% Sample Size=0
  18. 18. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|DATA)=68% Sample Size=100
  19. 19. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|Data) = 94% Sample Size=1,000
  20. 20. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|Data)=99.99…% Sample Size=10,000
  21. 21. AB TESTING ->LEARN FIRST Conductrics Confidential 23 Time Explore/ Learn Exploit/ Earn Data Collection/Sample Apply Leaning
  22. 22. How to Solve: 1. AB Testing 2. Multi-Arm Bandit A B Page A Convert Don’t Convert Location Decision Objective/Payoff SINGLE LOCATION DECISIONS/AB TEST
  23. 23. Like Bayesian AB Testing • Calculate P(A|Data) & P(B|Data) Unlike AB Testing • Don’t make fair selections (50/50) • Select based on P(A|Data) & P(B|Data) BANDIT: THOMPSON SAMPLING
  24. 24. Adaptive Construct Probability Distributions • Use Mean as center • Standard Deviation for spread A B C ADAPTIVE: THOMPSON SAMPLING
  25. 25. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49 ADAPTIVE: THOMPSON SAMPLING
  26. 26. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49B=0.51 ADAPTIVE: THOMPSON SAMPLING
  27. 27. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49C=0.46 B=0.51 ADAPTIVE: THOMPSON SAMPLING
  28. 28. A B C Adaptive – For Each User 1)Pick Option with Highest Score (Option B) A=0.49C=0.46 B=0.51 ADAPTIVE: THOMPSON SAMPLING
  29. 29. A B C Adaptive – Repeat 1)Take a random sample from each distribution ADAPTIVE: THOMPSON SAMPLING
  30. 30. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52 ADAPTIVE: THOMPSON SAMPLING
  31. 31. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52B=0.43 ADAPTIVE: THOMPSON SAMPLING
  32. 32. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52C=0.49B=0.43 ADAPTIVE: THOMPSON SAMPLING
  33. 33. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52C=0.49B=0.43 ADAPTIVE: THOMPSON SAMPLING
  34. 34. Selection Chance based on: 1. Relative estimated mean value of the option 2. Amount of overlap of the distributions 67% 8% 25% 0% 20% 40% 60% 80% 100% Option A Option B Option C Selection Chance ADAPTIVE: THOMPSON SAMPLING
  35. 35. 37 twitter: @mgershoff Trial and Error Learning Sequential Decisions Predictive Targeting TARGETING
  36. 36. PREDICTIVE TARGETING A Mapping ∑Behavioral Data Option/Actions
  37. 37. Confidential Thompson Sampling with Targeting
  38. 38. Confidential Thompson Sampling with Targeting
  39. 39. Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics LEARNING THE MAPPINGS • Regression (Lin, Logistic, etc.) • Deep Nets • Decision Trees
  40. 40. 𝑓(𝑥) = 𝑤0 + ෍ 𝑑 𝑤 𝑑 ∗ 𝑥 𝑑 Conductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics REGRESSION
  41. 41. 1) Input Data 2) Hidden Layer 3) Hidden Layer 4) Output Layer Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics DEEP LEARNING
  42. 42. Model as Decision Tree What Simple Model? Conductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics
  43. 43. REINFORCEMENT LEARNING
  44. 44. REINFORCEMENT LEARNING 1. Sequential Decisions 2. Delayed Rewards
  45. 45. EXAMPLE
  46. 46. Enter Site Page 1 Page 2 MULTI-TOUCH = DYNAMICS
  47. 47. Enter Site Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  48. 48. Enter Site Exit Site Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  49. 49. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  50. 50. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  51. 51. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  52. 52. 1. Conversion Rates Option Value Page1:A 3% Page1:B 4% Page2:C 10% Page2:D 12% MULTI-TOUCH = DYNAMICS
  53. 53. 1. Conversion Rates 2. Transition Frequencies Page:Action Page 1 Page 2 Page1:A - 30% Page1:B - 20% Page2:C 2% - Page2:D 1% - MULTI-TOUCH = DYNAMICS
  54. 54. This is Complicated! MULTI-TOUCH = DYNAMICS
  55. 55. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
  56. 56. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Q-LEARNING
  57. 57. Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! Q-LEARNING
  58. 58. Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! 2)Use the estimates at the next step as the conversion value! Q-LEARNING
  59. 59. Page 1 A B 1) Take an action Q-LEARNING
  60. 60. Page 1 A 1) Take an action – Pick A Q-LEARNING
  61. 61. Page 1 A 2) Measure what user does after Q-LEARNING
  62. 62. 2) Do they Convert? $10 Page 1 A Q-LEARNING
  63. 63. 2) Yes! $10 Page 1 A Q-LEARNING
  64. 64. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 2) Set r =$10 $10 Page 1 A Q-LEARNING
  65. 65. EXACTLY the SAME as AB TESTING $10 Page 1 A Q-LEARNING
  66. 66. 3) Do they next go to Page 2? Goal Page 1 A Page 2 Q-LEARNING
  67. 67. 3) Yes! Goal Page 1 Page 2 A Q-LEARNING
  68. 68. 3) Yes! Now in Dynamic part of Path Goal Page 1 Page 2 A Q-LEARNING
  69. 69. 71
  70. 70. Page 2 C D 4) Check Current Estimated Values ‘C’ & ‘D’ Q-LEARNING
  71. 71. 4) Check Current Estimated Values ‘C’ & ‘D’ Of course initially C=$0; D=$0 Page 2 C D $0 $0 Q-LEARNING
  72. 72. 4) Check Current Estimated Values ‘C’ & ‘D’ But assume mean of C=$1; D=$5 Page 2 C D $1 $5 Q-LEARNING
  73. 73. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 4) Set max(Q(st,at)) = $5 (value of D) Page 2 C D $1 $5 Q-LEARNING
  74. 74. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 1. 𝛄 𝐢𝐬 the 𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭 𝐫𝐚𝐭𝐞 2. Related to Google’s Half Life 3. 7 day half life  0.9 Q-LEARNING
  75. 75. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 5) 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟎 + 𝟎. 𝟗 ∗ $𝟓 $10 Page 1 Page 2 A Q-LEARNING
  76. 76. Direct Credit: $10.0 Attribution Credit: $4.5 Q-LEARNING
  77. 77. Direct Credit: $10.0 Attribution Credit: $4.5 Total Page1|A: $14.5 Q-LEARNING
  78. 78. 5) 𝐂𝐫𝐞𝐝𝐢𝐭 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟒. 𝟓 $10 Page 1 Page 2 A Q-LEARNING
  79. 79. Attribution in just two simple steps: 1)Treat Landing on Next Page like a regular conversion! 2)Use Predictions of future values at the next step as the conversion value! Q-LEARNING
  80. 80. Q Learning + Targeting User: Is a New User and from Rural area Page 1 Page 2 A
  81. 81. User: Is a New User and from Rural area Page 1 Page 2 A Q Learning + Targeting
  82. 82. Attribution calculation depends on [Rural;New] Page 1 Page 2 A Q Learning + Targeting
  83. 83. 85 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  84. 84. 86 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  85. 85. 87 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  86. 86. 88 Q-VALUE: NEW & RURAL USER 1. For New & Rural users Option B has highest value 2. Use predicted value of Option B for use in the Q-value calculation Source: Conductrics Predictive Audience Discovery
  87. 87. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Page 1 Page 2 A 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎 + 𝟎. 𝟗 ∗ 𝟎. 𝟒𝟏 Q Learning + Targeting
  88. 88. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Page 1 Page 2 A 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎. 𝟑𝟔𝟗 Q Learning + Targeting
  89. 89. 1) Bandits help solve Automation 2) Attribution can be solved by hacking ‘AB Testing’ (Q-Learning) 3) Extended Attribution to include decisions/experiments 4) Looked into the eye of AI and Lived WHAT DID WE LEARN
  90. 90. WAKE UP. WE ARE DONE! Twitter:mgershoff Email:matt.gershoff@conductrics.com

×