O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Multi-Armed Bandits:
 Intro, examples and tricks

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 34 Anúncio

Multi-Armed Bandits:
 Intro, examples and tricks

Baixar para ler offline

In this talk Ilias will discuss some variations of the Multi-Armed Bandits (MABs), a less popular although important area of Machine Learning. MABs enable us to build adaptive systems capable of finding solutions for tasks based on the interactions with their environment. MABs solve a task by acquiring useful knowledge at every step of an iterative process while they balance the exploration-exploitation dilemma. They are used to tackle practical problems like selecting appropriate online ads and personalized content for presentation to users; assigning people to cohorts in controlled trials; supporting decision making and more. To solve these kinds of problems solutions need to be identified as fast as possible since accepting errors can be costly. Ilias will discuss some examples from industry and academia as well as some of the related work at Atlassian.

In this talk Ilias will discuss some variations of the Multi-Armed Bandits (MABs), a less popular although important area of Machine Learning. MABs enable us to build adaptive systems capable of finding solutions for tasks based on the interactions with their environment. MABs solve a task by acquiring useful knowledge at every step of an iterative process while they balance the exploration-exploitation dilemma. They are used to tackle practical problems like selecting appropriate online ads and personalized content for presentation to users; assigning people to cohorts in controlled trials; supporting decision making and more. To solve these kinds of problems solutions need to be identified as fast as possible since accepting errors can be costly. Ilias will discuss some examples from industry and academia as well as some of the related work at Atlassian.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (18)

Quem viu também gostou (20)

Anúncio

Semelhante a Multi-Armed Bandits:
 Intro, examples and tricks (20)

Anúncio

Mais recentes (20)

Multi-Armed Bandits:
 Intro, examples and tricks

  1. 1. Multi-Armed Bandits:
 Intro, examples and tricks Dr Ilias Flaounas Senior Data Scientist at Atlassian Data Science Sydney meetup 22 March 2016
  2. 2. Motivation Increase awareness of some very useful but less known techniques Demo some current work at Atlassian Connect it with some research from my past Hopefully, there will be something useful for everybody — apologies for the few equations and loose notation
  3. 3. http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues- by-convening-conversat.html
  4. 4. ( rA,1 ) ( rC,2 ) rB,3 + rA,4 + rA,5 + rC,6 + rA,7 / nA + rC,8 / nB / nc µA = µB = µC =
  5. 5. 1. e-greedy: the best arm is selected for a proportion of 1-e of the trials and a random arm in e trials. 2. e-greedy with variable e 3. Pure exploration first, then pure exploitation. 4. … 5. Thompson sampling
 (Draw from the estimated beta-distrom 6. Upper Confidence Bound (UCB) Many solutions…
  6. 6. Disadvantages Reaching significance for non-winning arms takes longer Unclear stopping criteria
 Hard to order non-winning arms and assess reliably their impact Advantages Reaching significance for the winning arm is faster
 Best arm can change over time There are no false positives in the long term

  7. 7. Optimizely recently introduced MAB rebranded as: 
 “Traffic auto-allocation”
  8. 8. Let’s add some context What happens if we want to assess 100 variations? How about 1,000 or 10,000 variations?
  9. 9. Contextual Multi-Armed Bandits rA, t = f(xA,1, xA,2, xA,3…)A -> {xA,1, xA,2, xA,3…} rB,t = f(xB,1, xB,2, xB,3…) rC,t = f(xC,1, xC,2, xC,3…) Experiment parameters, e.g., price, 
 #users, product, bundles, colour of UI elements… B -> {xB,1, xB,2, xB,3…} C -> {xC,1, xC,2, xC,3…}
  10. 10. We introduce a notion of proximity or similarity between arms A -> {xA,1, xA,2, xA,3…} B -> {xB,1, xB,2, xB,3…} Contextual Multi-Armed Bandits
  11. 11. LinUCB L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010. The UCB is some expectation plus some confidence level: µ↵(t) + ↵(t) We assume there is some unknown vector θ∗, the same for each arm, 
 for which: E[ra,t|xa,t] = xT a,t✓⇤
  12. 12. ˆ✓t := C 1 t XT t yt Xt := {xa(1),1, xa(2),2, . . . , xa(t),t}T yt := {ra(1),1, ra(2),2, . . . , ra(t),t}T Ct := XT t Xt Using least squares: ˆµa(t) := xT a,t ˆ✓t E[ra,t|xa,t] = xT a,t✓⇤ µ↵(t) + ↵(t) ˆµa := xT a,tC 1 t XT t yt
  13. 13. The upper confidence bound is some expectation plus some confidence level: µ↵(t) + ↵(t) ˆ(t) := q xT a,tC 1 t xa,tˆµa := xT a,tC 1 t XT t yt
  14. 14. L. Li, W. Chu, J. Langford, R. E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation, WWW, 2010.
  15. 15. Product onboarding… Which arm would you pull?
  16. 16. • How can we locate the city of Bristol from tweets? • 10K candidate locations organised in a 100x100 grid • At every step we get tweets from one location and count mentions of “Bristol” • Challenge: find the target in sub-linear time complexity!
  17. 17. Linear methods fail on this problem. How can we go non-linear?
  18. 18. John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004. The Kernel trick! —no, it’s not just for SVMs
  19. 19. ˆµa(t) := xT a,t ˆ✓t ˆµa(t) = kT x,tK 1 t yt ˆa(t) = q tkT x,tK 2 t kx,tˆ(t) := q xT a,tC 1 t xa,t Ct := XT t Xt Kt = XtXT t LinUCB: M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013. KernelUCB:
  20. 20. • The last few steps of the algorithm before it locates Bristol. • KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).
  21. 21. Target is the red dot. We locate it using KernelUCB with RBF kernel. KernelUCB code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
  22. 22. What if we have a high-dimensional space? Hashing trick Implementation in Vowpal Wabbit, 
 by J. Langford, et al.
  23. 23. References M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013. L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010. John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004. Implementation of KernelUCB in Complacs toolkit:
 http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB https://en.wikipedia.org/wiki/Multi-armed_bandit https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-Example
  24. 24. Thank you - We are hiring! Dr Ilias Flaounas Senior Data Scientist <first>.<last>@atlassian.com

×