SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Lab Seminar: Contextual Bandit Survey
Sangwoo Mo
KAIST
swmo@kaist.ac.kr
August 4, 2016
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 1 / 32
Overview
1 Problem Setting
2 Na¨ıve Approach: Reduce to MAB
3 Stochastic Contextual Bandit
UCB & Thompson Sampling
Arbitrary Set of Policies
4 Adversarial Contextual Bandit
5 Supervised Learning to Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 2 / 32
Problem Setting
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 3 / 32
Multi-Armed Bandit
At each time t, the agent selects an arm at (at ∈ {1, ..., K})
Then, the agent recieves a reward rt(= rat ,t) from the enviroment
If ri,t is i.i.d. of some distribution, we call it stochastic bandit, and if
ri,t is selected by the enviroment, we call it adversarial bandit
The goal of MAB is to find the policy π ∈ Π s.t.
π(a1, r1, ...at−1, rt−1) = at
which minimizes the regret1
RT := max
i=1,...,K
E
T
t=1
ri,t −
T
t=1
rat ,t
1
Properly speaking, cumulative pseudo-regret.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 4 / 32
Contextual Bandit
In contextual bandit, the agent recieves an additional information
(=context) ct
1 ∈ C at the begining of time t
In stochastic contextual bandit, the reward ri,t can be represented as
a function of the context ci,t and noise i,t
ri,t = f (ci,t) + i,t
or simply ri,t = fi (ct) + i,t if ct is independent to i
In adversarial contextual bandit, the reward ri,t is selected by the
enviroment, as in the non-contextual MAB
1
Many literatures often notate ci,t to emphasize that each arm i has a corresponding context ci,t . However, both notations
are identical since we can construct a single vector ct by concatenating ci,t s.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 5 / 32
Optimal Regret Bound
Stochastic Bandit: Ω(log T)1
Adversarial Bandit: Ω(
√
KT)2
Contextual Bandit: Ω(d
√
T)3
1
Lai & Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985.
2
Auer et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS, 1995. By minmax strategy.
Note that adversarial bandit can be thought as a 2-player game by the agent and the enviroment.
3
Dani et al. Stochastic Linear Optimization under Bandit Feedback. COLT, 2012. Remark that the lower bound is Ω(
√
T)
even for the stochastic contextual bandit, since context may come in adversarially.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 6 / 32
Na¨ıve Approach: Reduce to MAB
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 7 / 32
Na¨ıve Approach: Reduce to MAB
Approach 1: assume the context set is finite (|C| = N)
Run MAB algorithm (ex. EXP3) for each context independently
The regret bound is O(
√
TNK log K)1 (w/ EXP3)
Approach 2: assume the policy space is finite (|H| = M)
Run MAB algorithm (ex. EXP3) on policies, instead of arms
The regret bound is O(
√
TM log M) (w/ EXP3)
1 N
c=1 O(nc
√
K log K) ≤ O(
√
TN
√
K log K) where nc is number of context c observed (by Cauchy-Schwarz inequality)
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 8 / 32
Stochastic Contextual Bandit
UCB & Thompson Sampling
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 9 / 32
Review: Index Policy and Greedy Algorithm
Since Gittins Index1, index policy became one of the most popular
strategy for MAB problems
Idea: for each time t, define a score si,t (=index) for each arm i.
Select an arm which has the highest score
Question: how to define proper si,t?
Na¨ıve approach: use empirical mean2! (greedy algorithm)
However, na¨ıve greedy algorithm may occur O(T) regret
1
Gittins. Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society, 1979.
2
Note that MAB becomes trivial if we know the true mean. The general goal of MAB algorithms is to estimate mean
correctly and rapidly (explore-exploit dilema)
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 10 / 32
Review: UCB1
Assume ri,t ∼ Pi with support [0, 1] and mean µi
Idea: select more seldom-selected arms and less often-selected arms.
In other words, give a confidence bonus1!
UCB12: define score as
si,t = ˆµi,t +
2 log t
ni,t
where ˆµi,t is empirical mean, and ni,t is number of arm i selected
UCB1 policy garantees the optimal regret O(log T)
Also, there are other choices for UCB (ex. KL-UCB3, Bayes-UCB4)
1
We call this bonus UCB(upper confidence bound). Thus, score = estimated mean + UCB.
2
Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002.
3
Garivier & Capp´e. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT, 2011.
4
Kaufmann et al. On Bayesian Upper Confidence Bounds for Bandit Problems. AISTATS, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 11 / 32
LinUCB
Assume ri,t ∼ P(ri,t | ci,t, θ) where E[ri,t] = cT
i,tθ∗ (ci,t, θ ∈ Rd )
Like UCB1, want to define score as
si,t = cT
i,t
ˆθt + UCBi,t
Question: how to choose proper UCBi,t?
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 12 / 32
LinUCB
Idea: let ˆθt be an estimator of θ∗ by ridge regression
ˆθt = (CT
t Ct + λId )−1
CT
t Rt
where Ct = {c1, ..., ct−1} and Rt = {r1, ..., rt−1}
Then, the inequality below holds with probability 1 − δ
T
cT
i,t
ˆθt − cT
i,tθ∗
≤ ( + 1) cT
i,tA−1
t ci,t
where At = CT
t Ct + Id and = 1
2 log 2TK
δ
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 13 / 32
LinUCB
LinUCB1: define score as
si,t = cT
i,t
ˆθt + α cT
i,tA−1
t ci,t
Regret bound (with probability 1 − δ) is
O(d T log
1 + T
δ
)
LinUCB policy garantees the optimal regret ˜O(d
√
T)
Also, there are other choices for UCB (ex. LinREL2, CoFineUCB3)
1
Li et al. A contextual-bandit approach to personalized news article recommendation. WWW, 2010.
2
Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. JMLR, 2002.
3
Yue et al. Hierarchical Exploration for Accelerating Contextual Bandits. ICML, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 14 / 32
Review: Thompson Sampling
Another popular strategy for MAB is Thompson Sampling1
It can be applied to both contextual and non-contextual bandit
Assume ri,t ∼ P(ri,t | ci,t, θ∗) with prior θ∗ ∼ P(θ)
Idea: sample estimator ˆθt from the posterior distribution
step 1. draw θt from posterior P(θ | D = {ct, at, rt})
step 2. select arm ai = arg maxi E[ri,t | ci,t, θt]
The idea is simple, but it works well both in theory2 and in practice3
1
Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples.
Biometrica, 1933.
2
Agrawal et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. COLT, 2012.
3
Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 2010.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 15 / 32
LinTS
Assume ri,t ∼ N(cT
i,tθ∗, v2) and θ∗ ∼ N(θt, v2B−1
t ) where
Bt =
t−1
τ=1
ci,τ cT
i,τ + Id , ˆθt = B−1
t
t−1
τ=1
ci,τ ri,τ
ri,t ∈ [¯ri,t − R, ¯ri,t + R], v = R
24
d log
t
δ
Then, the posterior of θ∗ is N(θt+1, v2B−1
t+1)
LinTS1: run Thompson Sampling in this assumption
Regret bound (with probability 1 − δ) is
O(
d2 √
T1+ log(Td) log
1
δ
)
1
Agrawal et al. Thompson Sampling for Contextual Bandits with Linear Payoffs. ICML, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 16 / 32
UCB & TS: Nonlinear Case
Assume E[ri,t] = f (ci,t) is general nonlinear function
If we assume f is a member of exponential family, we can use
GLM-UCB1
If we assume f is sampled from a Guassian Process, we can use
GP-UCB2/CGP-UCB3
If we assume f is an element of Reproducing Kernel Hilbert Space,
we can use KernelUCB4
Also, we can use Thompson Sampling if we know the form of
probability distribution
1
Filippi et al. Parametric Bandits: The Generalized Linear Case. NIPS, 2010.
2
Srinivas et al. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML, 2010.
3
Krause & Ong. Contextual Gaussian Process Bandit Optimization. NIPS, 2011.
4
Valko et al. Finite-Time Analysis of Kernelised Contextual Bandits. UAI, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 17 / 32
Stochastic Contextual Bandit
Arbitrary Set of Policies
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 18 / 32
Epoch-Greedy
Assume policy space H if finite1
Idea: explore T steps and exploit T − T steps (epsilon-first)
issue 1. how to get an unbiased estimator of the best policy?
issue 2. how to balance explore and exploit if we don’t know T?
trick 1: use D = {ct, at, rt} observed in explore step
ˆπ = max
π∈H
(ct ,at ,rt )∈D
raI(π(ct) = at)
1/K
trick 2: run epsilon-first in mini-batches (partition of T)
1
Infinite w/ finite VC-dimension can be derived in similar way
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 19 / 32
Epoch-Greedy
Epoch-Greedy1: combine trick 1 & trick 2
Regret bound is ˜O(T2/3) (not optimal!)
1
Langford & Zhang. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS, 2007.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 20 / 32
RandomizedUCB
Idea: estimate the distribution Pt over the policy space H
RandomizedUCB1:
Regret bound is ˜O(
√
T), but time complexity is O(T6)
1
Dudik et al. Efficient Optimal Learning for Contextual Bandits. UAI, 2011.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 21 / 32
ILOVECONBANDITS
Idea: similar to RandomizedUCB, improve time complexity
ILOVECONBANDITS1 (Importance-weighted LOw-Variance
Epoch-Timed Oracleized CONtextual BANDITS):
Regret bound is ˜O(
√
T), and time complexity is O(T1.5)
1
Agrawal et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. ICML, 2014.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 22 / 32
Adversarial Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 23 / 32
Review: EXP3
Assume ri,t ∈ [0, 1] is selected by the enviroment
In adversarial setting, the agent must select arm randomly
Idea: weight more probability to higher-reward ovserved arms
EXP31 (EXPonential-weight algorithm for EXPloration and
EXPloitation):
Regret bound is O(
√
TK log K)
1
Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 24 / 32
EXP4
Idea: run EXP3 on policies, instead of arms
EXP41 (EXPonential-weight algorithm for EXPloration and
EXPloitation using EXPert advice):
Regret bound is O(
√
TK log N), but variance is high
1
Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 25 / 32
EXP4.P
Idea: run EXP4 with better weight, to make algorithm stable
EXP4.P1 (EXP4 with Probability):
Regret bound is O(
√
TK log N), with high probability
1
Beygelzimer et al. Contextual Bandit Algorithms with Supervised Learning Guarantees. AISTATS, 2011.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 26 / 32
Supervised Learning to Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 27 / 32
Supervised Learning to Contextual Bandit
Idea: note that contextual bandit can be thought as a supervised
learing problem with partially-observed restriction
Trick: use randomized algorithm (ex. epsilon-greedy) and unbiased
(true) reward estimator ˆrat ,t =
rat ,t
pat
instead of observed reward rat ,t.
Then,
E[ˆri,t] = pi ·
ri,t
pi
+ (1 − pi ) · 0 = ri,t
Using this trick, any supervised learning algorithm can be converted
to a contextual bandit algorithm
Banditron and NeuralBandit are examples using neural network
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 28 / 32
Banditron and NeuralBandit
Both Banditron1 and NeuralBandit2 uses multi-layer perceptron and
epsilon-greedy algorithm w/ unbiased reward estimator
However, Banditron uses 0-1 loss (classification) while NeuralBandit
uses L2 loss (regression)
Regret bound of original Banditron is O(T2/3), and a 2nd-order
variant3 reduced it to ˜O(
√
T)
No theoretical garnatee is proved for NeuralBandit yet
1
Kakade et al. Efficient Bandit Algorithms for Online Multiclass Prediction. ICML, 2008.
2
Allesiardo et al. A Neural Networks Committee for the Contextual Bandit Problem. ICONIP, 2014.
3
Crammer & Gentile. Multiclass Classification with Bandit Feedback using Adaptive Regularization. ICML, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 29 / 32
Summary & Reference
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 30 / 32
Summary
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 31 / 32
Reference
[Zhou 2015] A Survey on Contextual Multi-armed Bandits. arXiv,
2015.
[Burtini’ 2015] A Survey of Online Experiment Design with the
Stochastic Multi-Armed Bandit. arXiv, 2015.
[Bubeck’ 2012] Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems. arXiv, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 32 / 32

Mais conteúdo relacionado

Mais procurados

実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだことnishio
 
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Yusuke Nakata
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkNaoki Matsunaga
 
Multi-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenMulti-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenFrosmo
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)RyuichiKanoh
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Contexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMiningContexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMining正志 坪坂
 
【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral CloningDeep Learning JP
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論Taiji Suzuki
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018佑 甲野
 
海鳥の経路予測のための逆強化学習
海鳥の経路予測のための逆強化学習海鳥の経路予測のための逆強化学習
海鳥の経路予測のための逆強化学習Tsubasa Hirakawa
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkYusuke Watanabe
 

Mais procurados (20)

モデルベース協調フィルタリングにおける推薦の透明性に関する検討
モデルベース協調フィルタリングにおける推薦の透明性に関する検討モデルベース協調フィルタリングにおける推薦の透明性に関する検討
モデルベース協調フィルタリングにおける推薦の透明性に関する検討
 
[DLHacks]DeepなSSM
[DLHacks]DeepなSSM[DLHacks]DeepなSSM
[DLHacks]DeepなSSM
 
実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと
 
最適腕識別
最適腕識別最適腕識別
最適腕識別
 
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural Network
 
Multi-armed bandit by Joni Turunen
Multi-armed bandit by Joni TurunenMulti-armed bandit by Joni Turunen
Multi-armed bandit by Joni Turunen
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
coordinate descent 法について
coordinate descent 法についてcoordinate descent 法について
coordinate descent 法について
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Contexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMiningContexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMining
 
【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning【DL輪読会】Implicit Behavioral Cloning
【DL輪読会】Implicit Behavioral Cloning
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018
 
海鳥の経路予測のための逆強化学習
海鳥の経路予測のための逆強化学習海鳥の経路予測のための逆強化学習
海鳥の経路予測のための逆強化学習
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural Network
 

Semelhante a Contextual Bandit Survey

A Transformational Approach to Resource Analysis with Typed-Norms
A Transformational Approach to Resource Analysis  with Typed-NormsA Transformational Approach to Resource Analysis  with Typed-Norms
A Transformational Approach to Resource Analysis with Typed-NormsFacultad de Informática UCM
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Dmitrii Ignatov
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...Alexander Decker
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceArthur Charpentier
 
1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docxstilliegeorgiana
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit MahajanAMDSeminarSeries
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Exploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesExploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesRob Hyndman
 
Rotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsRotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsJunghunKim27
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningUniversité de Liège (ULg)
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Istituto nazionale di statistica
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Jihun Yun
 

Semelhante a Contextual Bandit Survey (20)

A Transformational Approach to Resource Analysis with Typed-Norms
A Transformational Approach to Resource Analysis  with Typed-NormsA Transformational Approach to Resource Analysis  with Typed-Norms
A Transformational Approach to Resource Analysis with Typed-Norms
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Exploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesExploring the feature space of large collections of time series
Exploring the feature space of large collections of time series
 
Phd Proposal
Phd ProposalPhd Proposal
Phd Proposal
 
Rotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsRotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed Bandits
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learning
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
 

Mais de Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 

Mais de Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Último

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Último (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

Contextual Bandit Survey

  • 1. Lab Seminar: Contextual Bandit Survey Sangwoo Mo KAIST swmo@kaist.ac.kr August 4, 2016 Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 1 / 32
  • 2. Overview 1 Problem Setting 2 Na¨ıve Approach: Reduce to MAB 3 Stochastic Contextual Bandit UCB & Thompson Sampling Arbitrary Set of Policies 4 Adversarial Contextual Bandit 5 Supervised Learning to Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 2 / 32
  • 3. Problem Setting Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 3 / 32
  • 4. Multi-Armed Bandit At each time t, the agent selects an arm at (at ∈ {1, ..., K}) Then, the agent recieves a reward rt(= rat ,t) from the enviroment If ri,t is i.i.d. of some distribution, we call it stochastic bandit, and if ri,t is selected by the enviroment, we call it adversarial bandit The goal of MAB is to find the policy π ∈ Π s.t. π(a1, r1, ...at−1, rt−1) = at which minimizes the regret1 RT := max i=1,...,K E T t=1 ri,t − T t=1 rat ,t 1 Properly speaking, cumulative pseudo-regret. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 4 / 32
  • 5. Contextual Bandit In contextual bandit, the agent recieves an additional information (=context) ct 1 ∈ C at the begining of time t In stochastic contextual bandit, the reward ri,t can be represented as a function of the context ci,t and noise i,t ri,t = f (ci,t) + i,t or simply ri,t = fi (ct) + i,t if ct is independent to i In adversarial contextual bandit, the reward ri,t is selected by the enviroment, as in the non-contextual MAB 1 Many literatures often notate ci,t to emphasize that each arm i has a corresponding context ci,t . However, both notations are identical since we can construct a single vector ct by concatenating ci,t s. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 5 / 32
  • 6. Optimal Regret Bound Stochastic Bandit: Ω(log T)1 Adversarial Bandit: Ω( √ KT)2 Contextual Bandit: Ω(d √ T)3 1 Lai & Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985. 2 Auer et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS, 1995. By minmax strategy. Note that adversarial bandit can be thought as a 2-player game by the agent and the enviroment. 3 Dani et al. Stochastic Linear Optimization under Bandit Feedback. COLT, 2012. Remark that the lower bound is Ω( √ T) even for the stochastic contextual bandit, since context may come in adversarially. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 6 / 32
  • 7. Na¨ıve Approach: Reduce to MAB Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 7 / 32
  • 8. Na¨ıve Approach: Reduce to MAB Approach 1: assume the context set is finite (|C| = N) Run MAB algorithm (ex. EXP3) for each context independently The regret bound is O( √ TNK log K)1 (w/ EXP3) Approach 2: assume the policy space is finite (|H| = M) Run MAB algorithm (ex. EXP3) on policies, instead of arms The regret bound is O( √ TM log M) (w/ EXP3) 1 N c=1 O(nc √ K log K) ≤ O( √ TN √ K log K) where nc is number of context c observed (by Cauchy-Schwarz inequality) Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 8 / 32
  • 9. Stochastic Contextual Bandit UCB & Thompson Sampling Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 9 / 32
  • 10. Review: Index Policy and Greedy Algorithm Since Gittins Index1, index policy became one of the most popular strategy for MAB problems Idea: for each time t, define a score si,t (=index) for each arm i. Select an arm which has the highest score Question: how to define proper si,t? Na¨ıve approach: use empirical mean2! (greedy algorithm) However, na¨ıve greedy algorithm may occur O(T) regret 1 Gittins. Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society, 1979. 2 Note that MAB becomes trivial if we know the true mean. The general goal of MAB algorithms is to estimate mean correctly and rapidly (explore-exploit dilema) Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 10 / 32
  • 11. Review: UCB1 Assume ri,t ∼ Pi with support [0, 1] and mean µi Idea: select more seldom-selected arms and less often-selected arms. In other words, give a confidence bonus1! UCB12: define score as si,t = ˆµi,t + 2 log t ni,t where ˆµi,t is empirical mean, and ni,t is number of arm i selected UCB1 policy garantees the optimal regret O(log T) Also, there are other choices for UCB (ex. KL-UCB3, Bayes-UCB4) 1 We call this bonus UCB(upper confidence bound). Thus, score = estimated mean + UCB. 2 Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002. 3 Garivier & Capp´e. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT, 2011. 4 Kaufmann et al. On Bayesian Upper Confidence Bounds for Bandit Problems. AISTATS, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 11 / 32
  • 12. LinUCB Assume ri,t ∼ P(ri,t | ci,t, θ) where E[ri,t] = cT i,tθ∗ (ci,t, θ ∈ Rd ) Like UCB1, want to define score as si,t = cT i,t ˆθt + UCBi,t Question: how to choose proper UCBi,t? Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 12 / 32
  • 13. LinUCB Idea: let ˆθt be an estimator of θ∗ by ridge regression ˆθt = (CT t Ct + λId )−1 CT t Rt where Ct = {c1, ..., ct−1} and Rt = {r1, ..., rt−1} Then, the inequality below holds with probability 1 − δ T cT i,t ˆθt − cT i,tθ∗ ≤ ( + 1) cT i,tA−1 t ci,t where At = CT t Ct + Id and = 1 2 log 2TK δ Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 13 / 32
  • 14. LinUCB LinUCB1: define score as si,t = cT i,t ˆθt + α cT i,tA−1 t ci,t Regret bound (with probability 1 − δ) is O(d T log 1 + T δ ) LinUCB policy garantees the optimal regret ˜O(d √ T) Also, there are other choices for UCB (ex. LinREL2, CoFineUCB3) 1 Li et al. A contextual-bandit approach to personalized news article recommendation. WWW, 2010. 2 Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. JMLR, 2002. 3 Yue et al. Hierarchical Exploration for Accelerating Contextual Bandits. ICML, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 14 / 32
  • 15. Review: Thompson Sampling Another popular strategy for MAB is Thompson Sampling1 It can be applied to both contextual and non-contextual bandit Assume ri,t ∼ P(ri,t | ci,t, θ∗) with prior θ∗ ∼ P(θ) Idea: sample estimator ˆθt from the posterior distribution step 1. draw θt from posterior P(θ | D = {ct, at, rt}) step 2. select arm ai = arg maxi E[ri,t | ci,t, θt] The idea is simple, but it works well both in theory2 and in practice3 1 Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrica, 1933. 2 Agrawal et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. COLT, 2012. 3 Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 2010. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 15 / 32
  • 16. LinTS Assume ri,t ∼ N(cT i,tθ∗, v2) and θ∗ ∼ N(θt, v2B−1 t ) where Bt = t−1 τ=1 ci,τ cT i,τ + Id , ˆθt = B−1 t t−1 τ=1 ci,τ ri,τ ri,t ∈ [¯ri,t − R, ¯ri,t + R], v = R 24 d log t δ Then, the posterior of θ∗ is N(θt+1, v2B−1 t+1) LinTS1: run Thompson Sampling in this assumption Regret bound (with probability 1 − δ) is O( d2 √ T1+ log(Td) log 1 δ ) 1 Agrawal et al. Thompson Sampling for Contextual Bandits with Linear Payoffs. ICML, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 16 / 32
  • 17. UCB & TS: Nonlinear Case Assume E[ri,t] = f (ci,t) is general nonlinear function If we assume f is a member of exponential family, we can use GLM-UCB1 If we assume f is sampled from a Guassian Process, we can use GP-UCB2/CGP-UCB3 If we assume f is an element of Reproducing Kernel Hilbert Space, we can use KernelUCB4 Also, we can use Thompson Sampling if we know the form of probability distribution 1 Filippi et al. Parametric Bandits: The Generalized Linear Case. NIPS, 2010. 2 Srinivas et al. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML, 2010. 3 Krause & Ong. Contextual Gaussian Process Bandit Optimization. NIPS, 2011. 4 Valko et al. Finite-Time Analysis of Kernelised Contextual Bandits. UAI, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 17 / 32
  • 18. Stochastic Contextual Bandit Arbitrary Set of Policies Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 18 / 32
  • 19. Epoch-Greedy Assume policy space H if finite1 Idea: explore T steps and exploit T − T steps (epsilon-first) issue 1. how to get an unbiased estimator of the best policy? issue 2. how to balance explore and exploit if we don’t know T? trick 1: use D = {ct, at, rt} observed in explore step ˆπ = max π∈H (ct ,at ,rt )∈D raI(π(ct) = at) 1/K trick 2: run epsilon-first in mini-batches (partition of T) 1 Infinite w/ finite VC-dimension can be derived in similar way Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 19 / 32
  • 20. Epoch-Greedy Epoch-Greedy1: combine trick 1 & trick 2 Regret bound is ˜O(T2/3) (not optimal!) 1 Langford & Zhang. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS, 2007. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 20 / 32
  • 21. RandomizedUCB Idea: estimate the distribution Pt over the policy space H RandomizedUCB1: Regret bound is ˜O( √ T), but time complexity is O(T6) 1 Dudik et al. Efficient Optimal Learning for Contextual Bandits. UAI, 2011. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 21 / 32
  • 22. ILOVECONBANDITS Idea: similar to RandomizedUCB, improve time complexity ILOVECONBANDITS1 (Importance-weighted LOw-Variance Epoch-Timed Oracleized CONtextual BANDITS): Regret bound is ˜O( √ T), and time complexity is O(T1.5) 1 Agrawal et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. ICML, 2014. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 22 / 32
  • 23. Adversarial Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 23 / 32
  • 24. Review: EXP3 Assume ri,t ∈ [0, 1] is selected by the enviroment In adversarial setting, the agent must select arm randomly Idea: weight more probability to higher-reward ovserved arms EXP31 (EXPonential-weight algorithm for EXPloration and EXPloitation): Regret bound is O( √ TK log K) 1 Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 24 / 32
  • 25. EXP4 Idea: run EXP3 on policies, instead of arms EXP41 (EXPonential-weight algorithm for EXPloration and EXPloitation using EXPert advice): Regret bound is O( √ TK log N), but variance is high 1 Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 25 / 32
  • 26. EXP4.P Idea: run EXP4 with better weight, to make algorithm stable EXP4.P1 (EXP4 with Probability): Regret bound is O( √ TK log N), with high probability 1 Beygelzimer et al. Contextual Bandit Algorithms with Supervised Learning Guarantees. AISTATS, 2011. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 26 / 32
  • 27. Supervised Learning to Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 27 / 32
  • 28. Supervised Learning to Contextual Bandit Idea: note that contextual bandit can be thought as a supervised learing problem with partially-observed restriction Trick: use randomized algorithm (ex. epsilon-greedy) and unbiased (true) reward estimator ˆrat ,t = rat ,t pat instead of observed reward rat ,t. Then, E[ˆri,t] = pi · ri,t pi + (1 − pi ) · 0 = ri,t Using this trick, any supervised learning algorithm can be converted to a contextual bandit algorithm Banditron and NeuralBandit are examples using neural network Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 28 / 32
  • 29. Banditron and NeuralBandit Both Banditron1 and NeuralBandit2 uses multi-layer perceptron and epsilon-greedy algorithm w/ unbiased reward estimator However, Banditron uses 0-1 loss (classification) while NeuralBandit uses L2 loss (regression) Regret bound of original Banditron is O(T2/3), and a 2nd-order variant3 reduced it to ˜O( √ T) No theoretical garnatee is proved for NeuralBandit yet 1 Kakade et al. Efficient Bandit Algorithms for Online Multiclass Prediction. ICML, 2008. 2 Allesiardo et al. A Neural Networks Committee for the Contextual Bandit Problem. ICONIP, 2014. 3 Crammer & Gentile. Multiclass Classification with Bandit Feedback using Adaptive Regularization. ICML, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 29 / 32
  • 30. Summary & Reference Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 30 / 32
  • 31. Summary Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 31 / 32
  • 32. Reference [Zhou 2015] A Survey on Contextual Multi-armed Bandits. arXiv, 2015. [Burtini’ 2015] A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit. arXiv, 2015. [Bubeck’ 2012] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. arXiv, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 32 / 32