Mais conteúdo relacionado Semelhante a 추천시스템 이제는 돈이 되어야 한다. (20) 추천시스템 이제는 돈이 되어야 한다.35. • :
CTR(%)
•
• MAB(Multi Armed Bandit)
• User Clustering
-
!35
36. MAB(Multi Armed Bandit)
• MAB = Exploration( ) and Exploitation( ) Trade-off
• 10%( ) Feedback (impression,
click)
* ε-greedy MAB .
!36
37. • Feedback CTR(%) .
• CTR(%) = # of clicks / # of impressions
Exploration
0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1% 6.3%
3.6% 6.7% 8.0% 3.1% 3.6% 2.0% 4.4% 3.1% 7.3% 8.2% 2.7%
4.4% 8.1% 0.6% 5.9% 9.2% 7.3% 8.3% 8.6% 4.2% 9.9% 6.9%
* ε-greedy MAB .
!37
MAB(Multi Armed Bandit)
38. • CTR 90% ( )
• CTR
0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1% 6.3%
3.6% 6.7% 8.0% 3.1% 3.6% 2.0% 4.4% 3.1% 7.3% 8.2% 2.7%
4.4% 8.1% 0.6% 5.9% 9.2% 7.3% 8.3% 8.6% 4.2% 9.9% 6.9%
Exploitation8.0% 8.2%
* ε-greedy MAB .
!38
MAB(Multi Armed Bandit)
39. 0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1% 6.3%
3.6% 6.7% 8.0% 3.1% 3.6% 2.0% 4.4% 3.1% 7.3% 8.2% 2.7%
4.4% 8.1% 0.6% 5.9% 9.2% 7.3% 8.3% 8.6% 4.2% 9.9% 6.9%
Exploitation
(10%) (90%)
&
?
:
?
: .
!39
MAB(Multi Armed Bandit)
40. 0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1% 6.3%
3.6% 6.7% 8.0% 3.1% 3.6% 2.0% 4.4% 3.1% 7.3% 8.2% 2.7%
4.4% 8.1% 0.6% 5.9% 9.2% 7.3% 8.3% 8.6% 4.2% 9.9% 6.9%
Exploitation
Exploration(10%) Exploitation(90%)
• MAB ?
• TS-MAB
ε-greedy
UCB(Upper Confidence Bound)
Lin-UCB
Thompson Sampling
NeuralBandit
LinRel (Linear Associative Reinforcement Learning)
!40
MAB(Multi Armed Bandit)
42. ε-Greedy MAB ε=0.10
42
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 0.1% 0.2% 1.0% … 2.2%
(100 ) 10k Impression
43. ε-Greedy MAB ε=0.10
43
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 0.1% 0.2% 1.0% … 2.2%
(100 ) 10k Impression
CTR = 1.5%
Best
arm
( )
3
8.2%
7
5.2%
4
4.6%
50
3.0%
44. ε-Greedy MAB ε=0.10
44
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 0.1% 0.2% 1.0% … 2.2%
Best
arm
( )
3
8.2%
7
5.2%
4
4.6%
50
3.0%
90%(1-ε)
9M Impressions
(100 ) 10k Impression
CTR = 1.5%
CTR = 5.1%
CTR 4.74%
45. ε-Greedy MAB ε=0.10
45
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 1.5% 0.2% 1.0% … 2.2%
Best
arm
( )
3
8.2%
7
5.2%
4
4.6%
90%(1-ε)
9M Impressions
(100 ) 10k Impression
CTR = 1.5%
CTR = 5.1%
CTR 4.74%
10k Impression
CTR
Impressions CTR (3σ)
46. ε-Greedy MAB ε=0.10
46
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 1.5% 0.2%
10 Impressions
CTR
Impressions CTR
47. ε-Greedy MAB ε=0.10
47
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 1.5% 0.2% 1.0% … 2.2%
Best
arm
( )
3
8.2%
7
5.2%
4
4.6%
90%(1-ε)
9M Impressions
(100 ) 10k Impression
CTR = 1.5%
CTR = 5.1%
CTR 4.74%
CTR
Impressions 99.7%(3σ)
48. ε-Greedy MAB ε=0.10
48
10M Impressions
10%(ε)
1M Impressions( )
1.1% 2.0% 8.2% 0.01% 4.6% 1.2%
5.2% 1.5% 0.2% 1.0% … 2.2%
Best
arm
( )
3
8.2%
7
5.2%
4
4.6%
50
3.0%
90%(1-ε)
9M Impressions
(100 ) 10k Impression
CTR = 1.5%
CTR = 5.1%
CTR 4.74%
CTR
Impressions 99.7%(3σ)
CTR 3.0%
3.0%
3.0% 3.0% 3.0%
3.0%3.0%
49. ε-Greedy MAB ε=0.10
49
10M Impressions
10%(ε)
1.1% 2.0%
5.2% 1.5% 0.2% 1.0% … 2.2%
Best
arm
( )
90%(1-ε)
(100 ) 10k Impression
CTR = 1.5%
CTR = 5.1%
CTR 4.74%
CTR
Impressions 99.7%(3σ)
3.0% 3.0%3.0%
Optimal Arm
Impressions (regret )
51. Thompson Sampling MAB
• (arm) CTR Beta(a,b) . ( a=click, b=unclick )
51
1
10%
Impressions : 10 50 100 200 1k 10k
2
25%
Impressions : 10 50 100 200 1k 10k
52. Thompson Sampling MAB
• (arm) CTR Beta(a,b) ( a=click, b=unclick )
52
1
10%
( ) CTR 15%
1 (10%<15%) 100 Impressions trial
.
Impression ->
Impressions : 10 50 100 200 1k 10k
53. Thompson Sampling MAB
• (arm) CTR Beta(a,b) ( a=click, b=unclick )
53
2
25%
2 CTR 25%>15%
( )
.
Impressions : 10 50 100 200 1k 10k
66. Clustering ?
CB(image,Text)
Feature User Feature
[0.628, 0.88, 0.376, 0.065, 0.849]
[0.508, 0.268, 0.193, 0.125, 0.425]
[0.431, 0.077, 0.012, 0.07, 0.037]
[0.915, 0.294, 0.713, 0.851, 0.423]
[0.508, 0.268, 0.193, 0.125, 0.425]
[0.607, 0.639, 0.554, 0.092, 0.297]
[0.587, 0.319, 0.094, 0.173, 0.177]
[0.409, 0.458, 0.48, 0.319, 0.783]
[0.479, 0.434, 0.618, 0.297, 0.752]
[0.467, 0.206, 0.905, 0.7, 0.568]
, , ,
, ,
, , ,
!66
1
2
3
4
5
6
67. Clustering ?
14 CB(image,Text)
Feature User Feature
[0.628, 0.88, 0.376, 0.065, 0.849]
[0.508, 0.268, 0.193, 0.125, 0.425]
[0.431, 0.077, 0.012, 0.07, 0.037]
[0.915, 0.294, 0.713, 0.851, 0.423]
[0.508, 0.268, 0.193, 0.125, 0.425]
[0.607, 0.639, 0.554, 0.092, 0.297]
[0.587, 0.319, 0.094, 0.173, 0.177]
[0.409, 0.458, 0.48, 0.319, 0.783]
[0.479, 0.434, 0.618, 0.297, 0.752]
[0.467, 0.206, 0.905, 0.7, 0.568]
, , ,
, ,
, , ,
8 (#0~#7)
?
84. - Item Features
/ Image
(1) Image Feature
, Text
(2) Text Feature
Feedback
(3) CF-Feature
!84
88. 1 3 41 1 92 1
1 3 41 1 92 1
1 3 41 1 92 1
Image
Text( )
CF( )
!88
1 3 41 1 92 1Image Style
93. !93
1 3 41 1 92 1
1 3 41 1 92 1
1 3 41 1 92 1
1 3 41 1 92 1
95. 0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1%
0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1%
0.4% 4.4% 2.9% 7.3% 2.3% 8.7% 0.2% 1.0% 1.9% 8.1%
0.4% 6.0% 2.9% 7.3% 2.7% 5.6% 6.7% 1.0% 1.9% 8.1%
MAB
(%)
!95
96. 0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1%
0.4% 4.0% 2.9% 7.3% 2.7% 8.7% 6.7% 1.0% 1.9% 8.1%
0.4% 4.4% 2.9% 7.3% 2.3% 8.7% 0.2% 1.0% 1.9% 8.1%
0.4% 6.0% 2.9% 7.3% 2.7% 5.6% 6.7% 1.0% 1.9% 8.1%
!96
100. 100
1 2 3 4 5 6
…
89 90
(Clicks)
Impression
101. 101
1 2 3 4 5 6
…
89 90
(Use Coin)
Impression
104. 104
• MAB
- Bandit Algorithm = Thompson Sampling(
- Reward = Click (with Unclick )
- Play Arms = Cluster Most Popular
- None Stationary = Exponential Decaying
• 2
- = # of clicks / # of impressions
- = # of use_coins / # of impressions
1. MAB
105. 105
• MAB
- Bandit Algorithm = Thompson Sampling
- Reward = Click (with Unclick )
- Play Arms = Cluster Most Popular
- None Stationary = Exponential Decaying
• 2
- = # of clicks / # of impressions
- = # of use_coins / # of impressions
1. MAB
Use Coin( )
MAB Reward Use Coin, Click + User Coin
by @brandon.lim
109. 2. Conditional Bandit
109
1 2 3 4 5 6
…
89
Impressions
Reward=Click( )
α=click, β=unclick
MAB
Reward=Use Coin( )
α=use-coin, β=click
MAB
by @troye.kwon
113. 4. Seen decay
• : Negative Feedback
• click impression Ranker
• : (alpha) (Beta)
113
114. -> CTR
114
•
• Hard Clustering(k-Means) —> Soft Clustering (pLSI)
• Feature Matching
• Targetting Genre/Tag Matching
• MAB non-stationary Exponential Smoothing
• Targeting Unbiased Most Popular
• MAB Hyper parameter Turning
128. .
128
Base : Editor’s ( X) 1.9%
Alpha : 1 4.8%
Beta : 2 5.5%
Gamma : 3 6.5%
CTR
+ 242% + 42%
1 2 3 4
…
CTR