3. Mo:va:on
• Typical AcGve Learning
0 50 100 150
0.7
0.75
0.8
0.85
0.9
0.95
1
Mean Absolute Error
# of iterations
MAE
Strategy 1
Strategy 2
More info on Ac:ve Learning:
Rubens, Neil; Elahi, Mehdi; Sugiyama, Masashi; Kaplan, Dain; Ac:ve Learning in Recommender Systems, Recommender Systems Handbook, Springer US (2015)
4. Mo:va:on
• AdapGve AcGve Learning
0 50 100 150
0.7
0.75
0.8
0.85
0.9
0.95
1
# of iterations
MAE
Mean Absolute Error
Strategy 1
Strategy 2
Adaptive Strategy
Switching
point
More info on Adap:ve Ac:ve Learning:
Elahi, Mehdi, Francesco Ricci, and Neil Rubens. "A survey of ac:ve learning in collabora:ve filtering recommender systems." Computer Science Review (2016).
5. Mo:va:on
• AdapGve AcGve Learning
0 50 100 150
0.7
0.75
0.8
0.85
0.9
0.95
1
# of iterations
MAE
Mean Absolute Error
Adaptive Strategy
More info on Adap:ve Ac:ve Learning:
Elahi, Mehdi, Francesco Ricci, and Neil Rubens. "A survey of ac:ve learning in collabora:ve filtering recommender systems." Computer Science Review (2016).
6. n-Armed Bandit
§ Slot machine with n-arms, each of them will give different
reward
§ In every play, we should find the best arm to maximized the
total reward
Predict
the next
reward
Choose
the best
arm
Learn
from the
reward
7. Example
§ Example:
1st play 2nd play 3rd play
§ Every play is an Ac:on (a)
§ Then the system make transiGon to the
next State (s)
§ In every play a reward (r) is given
based on the chosen arm
§ How to play is a Policy (π)
which maps states to ac9ons
18. Reinforcement Comparison
§ We know that in RL:
AcGons with large rewards should be followed more likely than acGons
with small rewards
§ If the reward is 5, is it large or small?
§ Natural reference reward is the average of previously received rewards
Larger rewards > reference reward
Small rewards < reference reward
§ Method based on this idea are called Reinforcement Comparison
§ This method:
Introduces the fact probability of choosing an ac:on in ac:on selec:on
process:
Indicates that high rewards should increase the probability of reselec:ng the
ac:on were taken