Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Machine learning for computer vision part 2
1. Machine Learning Extra : 1BMVASummer School 2014
The bits the whirlwind tour left
out ...
BMVA Summer School 2014 – extra background slides
2. Machine Learning Extra : 2BMVASummer School 2014
Machine Learning
Definition:
– “A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks T, improves with
experience E.”
[Mitchell, 1997]
3. Machine Learning Extra : 3BMVASummer School 2014
Algorithm to construct decision trees ….
4. Machine Learning Extra : 4BMVASummer School 2014
Building Decision Trees – ID3
node = root of tree
Main loop:
A = “best” decision attribute for next node
.....
But which attribute is best to split on ?
5. Machine Learning Extra : 5BMVASummer School 2014
Entropy in machine learning
Entropy : a measure of impurity
– S is a sample of training examples
– P is the proportion of positive examples in S
– P⊖ is the proportion of negative examples in S
Entropy measures the impurity of S:
6. Machine Learning Extra : 6BMVASummer School 2014
Information Gain – reduction in Entropy
Gain(S,A) = expected reduction in entropy due to splitting
on attribute A
– i.e. expected reduction in impurity in the data
– (improvement in consistent data sorting)
7. Machine Learning Extra : 7BMVASummer School 2014
Information Gain – reduction in Entropy
– reduction in entropy in set of examples S if split on attribute A
– Sv
= subset of S for which attribute A has value v
– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)
8. Machine Learning Extra : 8BMVASummer School 2014
Information Gain – reduction in Entropy
Information Gain :
– “information provided about the target function given the value of
some attribute A”
– How well does A sort the data into the required classes?
Generalise to c classes :
– (not just or ⊖)
EntropyS=−∑
i=1
c
pi
log pi
9. Machine Learning Extra : 9BMVASummer School 2014
Building Decision Trees
Selecting the Next Attribute
– which attribute should we split on next?
10. Machine Learning Extra : 10BMVASummer School 2014
Building Decision Trees
Selecting the Next Attribute
– which attribute should we split on next?
12. Machine Learning Extra : 12BMVASummer School 2014
Backpropagation Algorithm
Assume we have:
– input examples d={1...D}
• each is pair {xd
,td
} = {input
vector, target vector}
– node index n={1 … N}
– weight wji
connects node j → i
– input xji
is the input on the
connection node j → i
• corresponding weight = wji
– output error for node n is δn
• similar to (o – t)
Output
Layer
Input layer
Input, x
Output vector, Ok
Hidden
Layer
nodeindex{1…N}
13. Machine Learning Extra : 13BMVASummer School 2014
Backpropagation Algorithm
(1) Input Example
example d
(2) output layer error
based on :
difference between
output and target
(t - o)
derivative of
sigmoid function
(3) Hidden layer error
proportional to
node contribution
to output error
(4) Update weights wij
–
14. Machine Learning Extra : 14BMVASummer School 2014
Backpropagation
Termination criteria
– number of iterations
reached
– Or error below
suitable bound
Output layer error
Hidden layer error
Add weights updated
using relevant error
15. Machine Learning Extra : 15BMVASummer School 2014
Backpropagation
Output
Layer, unit k
Input layer
Input, x
Output vector, Ok
Hidden
Layer,
unit h
16. Machine Learning Extra : 16BMVASummer School 2014
Backpropagation
Output vector, Ok
δh
is expressed as a weighted
sum of the output layer errors δk
to which it contributes (i.e. whk
> 0)
Output
Layer, unit k
Input layer
Input, x
Hidden
Layer,
unit h
17. Machine Learning Extra : 17BMVASummer School 2014
Backpropagation
Error is propogated
backwards from network
output ....
to weights of output layer
....
to weights of the hidden
layer
…
Hence the name:
backpropagation
Output
Layer, unit k
Input layer
Input, x
Hidden
Layer,
unit h
Output vector, Ok
18. Machine Learning Extra : 18BMVASummer School 2014
Backpropagation
Repeat these stages for every
hidden layer in a
multi-layer network:
(using error δi
where xji
>0)
.......
Output
Layer, unit k
Input layer
Input, x
Hidden
Layer(s),
unit h
Output vector, Ok
19. Machine Learning Extra : 19BMVASummer School 2014
Backpropagation
Error is propogated
backwards from network
output ....
to weights of output layer
....
over weights of all N
hidden layers
…
Hence the name:
backpropagation
.......
Output
Layer, unit k
Input layer
Input, x
Hidden
Layer(s),
unit h
Output vector, Ok
20. Machine Learning Extra : 20BMVASummer School 2014
Backpropagation
Will perform
gradient descent
over the weight
space of {wji
} for all
connections i → j in
the network
Stochastic gradient
descent
– as updates based on
training one sample
at a time
21. Machine Learning Extra : 21BMVASummer School 2014
Understanding (and believing) the SVM stuff ….
22. Machine Learning Extra : 22BMVASummer School 2014
Remedial Note: equations of 2D lines
Line:
where:
are 2D vectors.
Offset from origin
Normal to line
2D LINES REMINDER
23. Machine Learning Extra : 23BMVASummer School 2014
Remedial Note: equations of 2D lines
http://www.mathopenref.com/coordpointdisttrig.html
2D LINES REMINDER
24. Machine Learning Extra : 24BMVASummer School 2014
Remedial Note: equations of 2D lines
For a defined line equation:
Fixed
Insert point into equation …...
Normal to line
Result is +ve if
point on this side
of line (i.e.> 0).
Result is -ve if
point on this side
of line. ( < 0 )
Result is the distance (+ve or
-ve) of point from line given by:
for:
2D LINES REMINDER
25. Machine Learning Extra : 25BMVASummer School 2014
Linear Separator
Instances (i.e, examples) {xi ,
yi
}
– xi
= point in instance space (Rn
) made
up of n attributes
– yi
=class value for classification of xi
Want a linear separator. Can
view this as constraint
satisfaction problem:
Equivalently,
y = +1
y = -1
Classification of
example function
f(x) = y = {+1, -1}
i.e. 2 classes
N.B. we have a vector of weights coefficients ⃗w
26. Machine Learning Extra : 26BMVASummer School 2014
Linear Separator
If we define the distance of
the nearest point to the
margin as 1
→ width of margin is
(i.e. equal width each side)
We thus want to maximize:
finding the parameters:
y = +1
y = -1
Classification of example
function
f(x) = y = {+1, -1}
i.e. 2 classes
29. Machine Learning Extra : 29BMVASummer School 2014
So ….
Find the “hyperplane” (i.e. boundary) with:
a) maximum margin
b) minimum number of (training) examples on the
wrong side of the chosen boundary
(i.e. minimal penalties due to C)
Solve via optimization (in polynomial
time/complexity)
30. Machine Learning Extra : 30BMVASummer School 2014
Find hyperplane separator (plane in 3D) via optimization
Non-linear Separation (red / blue data items
on 2D plane).
Kernel projection to higher dimensional space
Non-linear boundary in original dimension
(e.g. circle n 2D) defined by planar boundary (cut) in 3D.
Example:
32. Machine Learning Extra : 32BMVASummer School 2014
Desirable Data Properties
Machine learning is a Data Driven Approach
The Data is important!
Ideally training/testing data used for learning must be:
– Unbiased
• towards any given subset of the space of examples ...
– Representative
• of the “real-world” data to be encountered in use/deployment
– Accurate
• inaccuracies in training/testing produce inaccuracies results
– Available
• the more training/testing data available the better the results
• greater confidence in the results can be achieved
33. Machine Learning Extra : 33BMVASummer School 2014
Data Training Methodologies
Simple approach : Data Splits
– split overall data set into separate training and test sets
• No established rule but 80%:20%, 70%:30% or ⅓:⅔ training to testing
splits common
– Training on one, test on the other
– Test error = error on the test set
– Training error = error on training set
– Weakness: susceptible to bias in data sets or “over-fitting”
• Also less data available for training
34. Machine Learning Extra : 34BMVASummer School 2014
Data Training Methodologies
More advanced (and robust): K Cross Validation
– Randomly split (all) the data into k-subsets
– For 1 to k
• train using all the data not in kth
subset
• test resulting learned [classifier|function …]
using kth
subset
– report mean error over all k tests
35. Machine Learning Extra : 35BMVASummer School 2014
Key Summary Statistics #1
tp = true positive / tn = true negative
fp = false positive / fn = false negative
Often quoted or plotted when comparing ML techniques
36. Machine Learning Extra : 36BMVASummer School 2014
Kappa Statistic
Measure of classification of “N items into C mutually
exclusive categories”
Pr(a) = probability of success of classification ( = accuracy)
Pr(e) = probability of success due to chance
– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc.
– Pr(e) can be replaced with Pr(b) to measure agreement between
classifiers/techniques a and b
[Cohen, 1960]