• PS: This file is for reference only. Do not
depend solely on it for the content. It is to
supplement your Text book content. It is
recommended to go through suggested
readings/Text book to have detailed
knowledge of the content.
1
Definition
• In 1959, Arthur Samuel, a pioneer in the field
of machine learning (ML) defined it as the
“field of study that gives computers the
ability to learn without being explicitly
programmed”
3
Definition
“A computer program is said to learn from experience
with respect to some class of tasks and performance
measure, if the performance at the tasks, as measured by
the performance measure, improves with experience”
Features of a well-defined learning problem:
• The learning task
• The measure of performance
• The task experience
• Types of learning tasks
What is the Learning Problem?
• Learning = Improving with experience at some
task
• Improve over task T ,
• with respect to performance measure P ,
• based on experience E.
6
What is the Learning Problem?
• E.g., Learn to play checkers
T : Play checkers
P : % of games won in world tournament
E: opportunity to play against self
•
7
Learning to Play Checkers
• E.g., Learn to play checkers
T : Play checkers
P : % of games won in world tournament
• What Experience
• What exactly should be learned?
• How shall it be represented?
• What specific algorithm to learn it?
8
Designing a Learning System
• Consider designing a program to learn to play
checkers, with the goal of entering it in the world
checkers tournament
9
Designing a Learning System
• Consider designing a program to learn to play
checkers, with the goal of entering it in the world
checkers tournament
• Performance measure: the percentage of games it
wins in this tournament.
• Requires the following sets
– Choosing Training Experience
– Choosing the Target Function
– Choosing the Representation of the Target Function
– Choosing the Function Approximation Algorithm
10
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
2. What amount of interaction should there be
between the system and the supervisor?
3. Which training examples?
11
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
• Will the training experience provide direct or indirect
feedback?
– Direct Feedback: system learns from examples of individual checkers
board states and the correct move for each
Just a bunch of board states together with a correct move.
12
Choosing the Training Experience
1. What training experience should the system have?
– A design choice with great impact on the outcome.
• Will the training experience provide direct or indirect
feedback?
– Direct Feedback: system learns from examples of individual checkers
board states and the correct move for each
Just a bunch of board states together with a correct move.
– Indirect Feedback: A bunch of recorded games, where the correctness
of the moves is inferred by the result of the game.
• Credit assignment problem: Value of early states must be inferred from
the outcome
14
Direct feedback easier to learn from
Choosing the Training Experience
2. What amount of interaction should there be between the
system and the supervisor?
– Choice #1: No freedom. Supervisor provides all training
examples.
– Choice #2: Semi-free. Supervisor provides training
examples, system constructs its own examples too, and
asks questions to the supervisor in cases of doubt.
– Choice #3: Total-freedom. System learns to play
completely unsupervised
• How “daring” the system should be in exploring new boards?
15
Choosing the Training Experience
3. Which training examples?
– There is an huge huge number of possible games.
– No time to try all possible games.
– System should learn with examples that it will
encounter in the future.
– For example, if the goal is to beat humans, it
should be able to do well in situations that
humans encounter when they play (this is hard to
achieve in practice).
16
Choosing the Training Experience
– If training the checkers program consists only of
experiences played against itself, it may never encounter
crucial board states that are likely to be played by the
human checkers champion
– Most theory of machine learning rests on the assumption
that the distribution of training examples is identical to the
distribution of test examples
17
Partial Design of Checkers Learning
Program
• A checkers learning problem:
– Task T: playing checkers
– Performance measure P: percent of games won in the
world tournament
– Training experience E: games played against itself
• Remaining choices
– The exact type of knowledge to be learned
– A representation for this target knowledge
– A learning mechanism
18
Choosing the Target Function
What should be learned exactly?
• The computer program knows the legal moves.
Should learn how to choose the best move. Program
needs to learn the best move from among legal moves
• The computer should learn a ‘hidden’ function.
– target function: ChooseMove : B → M
– B legal Board state, M – legal Move
• ChooseMove is difficult to learn given indirect training
19
Choosing the Target Function
• So, our Alternative target function
– An evaluation function that assigns a numerical score to any given
board state
– V : B → ( where is the set of real numbers)
• V(b) for an arbitrary board state b in B
– if b is a final board state that is won, then V(b) = 100
– if b is a final board state that is lost, then V(b) = -100
– if b is a final board state that is drawn, then V(b) = 0
– if b is not a final state, then V(b) = V(b '), where b' is the
best final board state that can be achieved starting from b
and playing optimally until the end of the game
21
Choosing the Target Function
• V(b) gives a recursive definition for board state b
– Not usable because not efficient to compute except is first
three trivial cases
– nonoperational definition
• Goal of learning is to discover an operational
description of V
• Learning the target function is often called function
approximation
– Referred to as
22
V̂
Choosing a Representation for the Target
Function
• Choice of representations involve trade offs
– Pick a very expressive representation to allow close approximation to
the ideal target function V
– More expressive, more training data required to choose among
alternative hypotheses
• Use linear combination of the following board features:
– x1: the number of black pieces on the board
– x2: the number of red pieces on the board
– x3: the number of black kings on the board
– x4: the number of red kings on the board
– x5: the number of black pieces threatened by red (i.e. which can be
captured on red's next turn)
– x6: the number of red pieces threatened by black
23
6
6
5
5
4
4
3
3
2
2
1
1
0
)
(
ˆ x
w
x
w
x
w
x
w
x
w
x
w
w
b
V
Partial Design of Checkers Learning
Program
• A checkers learning problem:
– Task T: playing checkers
– Performance measure P: percent of games won in the
world tournament
– Training experience E: games played against itself
– Target Function: V: Board →
– Target function representation
25
6
6
5
5
4
4
3
3
2
2
1
1
0
)
(
ˆ x
w
x
w
x
w
x
w
x
w
x
w
w
b
V
Choosing a Function Approximation
Algorithm
• To learn we require a set of training
examples describing the board b and the
training value Vtrain(b)
– Ordered pair
26
V̂
b
V
b train
,
100
,
0
,
0
,
0
,
1
,
0
,
3 6
5
4
3
2
1
x
x
x
x
x
x
x1: the number of black pieces on the board
x2: the number of red pieces on the board
x3: the number of black kings on the board
x4: the number of red kings on the board
x5: the number of black pieces threatened by red (i.e. which can be
captured on red's next turn)
x6: the number of red pieces threatened by black
Choosing a Function Approximation
Algorithm
• Need a procedure that first derives such training
examples from the indirect training experience, then
adjust the weights Wi to best fits these training
examples.
27
Estimating Training Values
• Need to assign specific scores to intermediate
board states
• Approximate intermediate board state b using
the learner's current approximation of the
next board state following b
– Simple and successful approach
– More accurate for states closer to end states
28
))
(
(
ˆ
)
( b
Successor
V
b
Vtrain
Adjusting the Weights
• Choose the weights wi to best fit the set of training examples
• Minimize the squared error E between the train values and
the values predicted by the hypothesis
• Require an algorithm that
– will incrementally refine weights as new training examples become
available
– will be robust to errors in these estimated training values
• Least Mean Squares (LMS) is one such algorithm
29
examples
training
b
V
b
train
train
b
V
b
V
E
,
2
ˆ
LMS Weight Update Rule
• For each train example
– Use the current weights to calculate
– For each weight wi, update it as
– where
• is a small constant (e.g. 0.1)
30
b
V
b train
,
b
V
ˆ
i
train
i
i x
b
V
b
V
w
w ˆ