Sponsored content in contextual bandits. Deconfounding targeting not at random

GRAPE
GRAPEGRAPE
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Sponsored content in contextual bandits.
Deconfounding Targeting Not At Random
MIUE 2023
Hubert Drążkowski
GRAPE|FAME, Warsaw University of Technology
September 22, 2023
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Motivational examples
Recommender systems
• Suggest best ads/movies a ∈ {a1, a2, ...aK }
• Users X1, X2, ...., XT
• Design of the study {na1
, na2
, ..., naK
},
P
i nai
= T
• Measured satisfaction {Rt(a1), ...Rt(aK )}T
t=1
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The flow of information
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Inverse Gap Weighting
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Causal interpretation
Figure 1: TCAR
Figure 2: TAR
Figure 3: TNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS
Internal validity
External validity
Propensity score ?
Table 1: Differences and similarities between data sources
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0
(X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0
(xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Deconfounded CATE IGW (D-CATE-IGW)
• Let b = arg maxa b
µa(xt).
π(a|x) =
( 1
K+γm(b
µm
b (x)−b
µm
a (x))
for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
=
(
1
K+γm b
τb,a(x) for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
,
• Each round/epoch deconfound the CATE
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result I
Figure 4: Normed cumulative regret for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result II
Figure 5: True and estimated CATE values for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Contribution
1 Pioneering model for sponsored content in contextual bandits framework
2 Bandits not as experimental studies, but as observational studies
3 Confounding scenario and deconfounding application
4 D-CATE-IGW works
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The beginning ...
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang
(2020). Causal inference methods for combining randomized trials and observational studies: a
review. arXiv preprint arXiv:2011.08047.
Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental
grounding. Advances in neural information processing systems 31.
Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating
heterogeneous treatment effects using machine learning. Proceedings of the national academy of
sciences 116(10), 4156–4165.
Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press.
Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining
experimental and observational studies. In Conference on Causal Learning and Reasoning, pp.
904–926. PMLR.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26
1 de 68

Mais conteúdo relacionado

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random

ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt42HSQuangMinh
7 visualizações154 slides
ppt0320defensedayppt0320defenseday
ppt0320defensedayXi (Shay) Zhang, PhD
542 visualizações48 slides
mainmain
mainDavid Mateos
194 visualizações75 slides

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random(20)

ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt
42HSQuangMinh7 visualizações
ppt0320defensedayppt0320defenseday
ppt0320defenseday
Xi (Shay) Zhang, PhD542 visualizações
mainmain
main
David Mateos194 visualizações
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert2.7K visualizações
E5-roughsets unit-V.pdfE5-roughsets unit-V.pdf
E5-roughsets unit-V.pdf
Ramya Nellutla18 visualizações
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
University of Glasgow546 visualizações
Damiano PasettoDamiano Pasetto
Damiano Pasetto
CoupledHydrologicalModeling629 visualizações
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert3.4K visualizações
Frequency14.pptxFrequency14.pptx
Frequency14.pptx
MewadaHiren10 visualizações
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute95 visualizações
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin1.7K visualizações

Mais de GRAPE

Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdfGRAPE
4 visualizações208 slides
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdfGRAPE
8 visualizações207 slides
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdfGRAPE
4 visualizações187 slides
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
4 visualizações175 slides
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
6 visualizações113 slides
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
12 visualizações110 slides

Mais de GRAPE(20)

Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdf
GRAPE4 visualizações
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdf
GRAPE8 visualizações
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdf
GRAPE4 visualizações
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE4 visualizações
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE6 visualizações
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE12 visualizações
Slides.pdfSlides.pdf
Slides.pdf
GRAPE12 visualizações
Slides.pdfSlides.pdf
Slides.pdf
GRAPE12 visualizações
DDKT-Munich.pdfDDKT-Munich.pdf
DDKT-Munich.pdf
GRAPE7 visualizações
DDKT-Praga.pdfDDKT-Praga.pdf
DDKT-Praga.pdf
GRAPE10 visualizações
DDKT-Southern.pdfDDKT-Southern.pdf
DDKT-Southern.pdf
GRAPE15 visualizações
DDKT-SummerWorkshop.pdfDDKT-SummerWorkshop.pdf
DDKT-SummerWorkshop.pdf
GRAPE14 visualizações
DDKT-SAET.pdfDDKT-SAET.pdf
DDKT-SAET.pdf
GRAPE27 visualizações
POSTER_EARHART.pdfPOSTER_EARHART.pdf
POSTER_EARHART.pdf
GRAPE29 visualizações
slides_cef.pdfslides_cef.pdf
slides_cef.pdf
GRAPE22 visualizações

Último(20)

MEMU Nov 2023 En.pdfMEMU Nov 2023 En.pdf
MEMU Nov 2023 En.pdf
Інститут економічних досліджень та політичних консультацій60 visualizações
Lundin Gold Corporate Presentation Nov 2023.pdfLundin Gold Corporate Presentation Nov 2023.pdf
Lundin Gold Corporate Presentation Nov 2023.pdf
Adnet Communications192 visualizações
Reversing the Great TransformationReversing the Great Transformation
Reversing the Great Transformation
Asad Zaman30 visualizações
Jeremy Hunt's letter to Nausicaa DelfasJeremy Hunt's letter to Nausicaa Delfas
Jeremy Hunt's letter to Nausicaa Delfas
Henry Tapper509 visualizações
Summary of financial results for the 3Q2023Summary of financial results for the 3Q2023
Summary of financial results for the 3Q2023
InterCars74 visualizações
ROLEOFRESERVEBANKININDIANBANKINGSYSTEM.pdfROLEOFRESERVEBANKININDIANBANKINGSYSTEM.pdf
ROLEOFRESERVEBANKININDIANBANKINGSYSTEM.pdf
Mdshamir58 visualizações
Ingenious Nov 2023 to Jan 2024.pdfIngenious Nov 2023 to Jan 2024.pdf
Ingenious Nov 2023 to Jan 2024.pdf
Ankur Shah24 visualizações
score 10000.pdfscore 10000.pdf
score 10000.pdf
sadimd0076 visualizações
Economic Capsule - November 2023Economic Capsule - November 2023
Economic Capsule - November 2023
Commercial Bank of Ceylon PLC39 visualizações
Lion One Presentation MIF November 2023Lion One Presentation MIF November 2023
Lion One Presentation MIF November 2023
Adnet Communications629 visualizações
Motivation TheoryMotivation Theory
Motivation Theory
lamluanvan.net Viết thuê luận văn6 visualizações
MATRIX.pptxMATRIX.pptx
MATRIX.pptx
baijup414 visualizações
Stock Market Brief Deck 1121.pdfStock Market Brief Deck 1121.pdf
Stock Market Brief Deck 1121.pdf
Michael Silva68 visualizações
Stock Market Brief Deck 1124.pdfStock Market Brief Deck 1124.pdf
Stock Market Brief Deck 1124.pdf
Michael Silva57 visualizações
Monthly Market Outlook | November 2023Monthly Market Outlook | November 2023
Monthly Market Outlook | November 2023
iciciprumf22 visualizações
What is Credit Default SwapsWhat is Credit Default Swaps
What is Credit Default Swaps
MksSkyView7 visualizações

Sponsored content in contextual bandits. Deconfounding targeting not at random

  • 1. Authoritarian Sponsor Deconfounding Experiment Conclusions References Sponsored content in contextual bandits. Deconfounding Targeting Not At Random MIUE 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology September 22, 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
  • 2. Authoritarian Sponsor Deconfounding Experiment Conclusions References Motivational examples Recommender systems • Suggest best ads/movies a ∈ {a1, a2, ...aK } • Users X1, X2, ...., XT • Design of the study {na1 , na2 , ..., naK }, P i nai = T • Measured satisfaction {Rt(a1), ...Rt(aK )}T t=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
  • 3. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 4. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 5. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 6. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 7. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 8. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 9. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 10. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 11. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 12. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 13. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 14. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 15. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 16. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 17. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 18. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 19. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 20. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 21. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 22. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 23. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 24. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 25. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 26. Authoritarian Sponsor Deconfounding Experiment Conclusions References The flow of information Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
  • 27. Authoritarian Sponsor Deconfounding Experiment Conclusions References Inverse Gap Weighting Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
  • 28. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
  • 29. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 30. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 31. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 32. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 33. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 34. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 35. Authoritarian Sponsor Deconfounding Experiment Conclusions References Causal interpretation Figure 1: TCAR Figure 2: TAR Figure 3: TNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
  • 36. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
  • 37. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Internal validity External validity Propensity score ? Table 1: Differences and similarities between data sources Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
  • 38. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 39. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 40. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 41. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 42. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 43. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 44. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 45. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 46. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 47. Authoritarian Sponsor Deconfounding Experiment Conclusions References Deconfounded CATE IGW (D-CATE-IGW) • Let b = arg maxa b µa(xt). π(a|x) = ( 1 K+γm(b µm b (x)−b µm a (x)) for a ̸= b 1 − P c̸=b π(c|x) for a = b = ( 1 K+γm b τb,a(x) for a ̸= b 1 − P c̸=b π(c|x) for a = b , • Each round/epoch deconfound the CATE Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
  • 48. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
  • 49. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 50. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 51. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 52. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 53. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result I Figure 4: Normed cumulative regret for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
  • 54. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result II Figure 5: True and estimated CATE values for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
  • 55. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
  • 56. Authoritarian Sponsor Deconfounding Experiment Conclusions References Contribution 1 Pioneering model for sponsored content in contextual bandits framework 2 Bandits not as experimental studies, but as observational studies 3 Confounding scenario and deconfounding application 4 D-CATE-IGW works Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
  • 57. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 58. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 59. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 60. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 61. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 62. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 63. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 64. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 65. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 66. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 67. Authoritarian Sponsor Deconfounding Experiment Conclusions References The beginning ... Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
  • 68. Authoritarian Sponsor Deconfounding Experiment Conclusions References Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang (2020). Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047. Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental grounding. Advances in neural information processing systems 31. Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116(10), 4156–4165. Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press. Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining experimental and observational studies. In Conference on Causal Learning and Reasoning, pp. 904–926. PMLR. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26