18. Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
19. Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”
“2”
“8”
“2”
20. Instance
Raw data
Class (label)
A data sample:
“7”
28px
28 px
784 pixels in total
Feature vector
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
Dataset
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”
“2”
“8”
“2”
21. The data is in the right format — what’s next?
22. The data is in the right format — what’s next?
• C4.5
• Random forests
• Bayesian networks
• Hidden Markov models
• Artificial neural network
• Data clustering
• Expectation-maximization
algorithm
• Self-organizing map
• Radial basis function network
• Vector Quantization
• Generative topographic map
• Information bottleneck method
• IBSEAD
• Apriori algorithm
• Eclat algorithm
• FP-growth algorithm
• Single-linkage clustering
• Conceptual clustering
• K-means algorithm
• Fuzzy clustering
• Temporal difference learning
• Q-learning
• Learning Automata
• AODE
• Artificial neural network
• Backpropagation
• Naive Bayes classifier
• Bayesian network
• Bayesian knowledge base
• Case-based reasoning
• Decision trees
• Inductive logic
programming
• Gaussian process regression
• Gene expression
programming
• Group method of data
handling (GMDH)
• Learning Automata
• Learning Vector
Quantization
• Logistic Model Tree
• Decision tree
• Decision graphs
• Lazy learning
• Monte Carlo Method
• SARSA
• Instance-based learning
• Nearest Neighbor Algorithm
• Analogical modeling
• Probably approximately correct learning
(PACL)
• Symbolic machine learning algorithms
• Subsymbolic machine learning algorithms
• Support vector machines
• Random Forest
• Ensembles of classifiers
• Bootstrap aggregating (bagging)
• Boosting (meta-algorithm)
• Ordinal classification
• Regression analysis
• Information fuzzy networks (IFN)
• Linear classifiers
• Fisher's linear discriminant
• Logistic regression
• Naive Bayes classifier
• Perceptron
• Support vector machines
• Quadratic classifiers
• k-nearest neighbor
• Boosting
Pick an algorithm
23. The data is in the right format — what’s next?
• C4.5
• Random forests
• Bayesian networks
• Hidden Markov models
• Artificial neural network
• Data clustering
• Expectation-maximization
algorithm
• Self-organizing map
• Radial basis function network
• Vector Quantization
• Generative topographic map
• Information bottleneck method
• IBSEAD
• Apriori algorithm
• Eclat algorithm
• FP-growth algorithm
• Single-linkage clustering
• Conceptual clustering
• K-means algorithm
• Fuzzy clustering
• Temporal difference learning
• Q-learning
• Learning Automata
• AODE
• Artificial neural network
• Backpropagation
• Naive Bayes classifier
• Bayesian network
• Bayesian knowledge base
• Case-based reasoning
• Decision trees
• Inductive logic
programming
• Gaussian process regression
• Gene expression
programming
• Group method of data
handling (GMDH)
• Learning Automata
• Learning Vector
Quantization
• Logistic Model Tree
• Decision tree
• Decision graphs
• Lazy learning
• Monte Carlo Method
• SARSA
• Instance-based learning
• Nearest Neighbor Algorithm
• Analogical modeling
• Probably approximately correct learning
(PACL)
• Symbolic machine learning algorithms
• Subsymbolic machine learning algorithms
• Support vector machines
• Random Forest
• Ensembles of classifiers
• Bootstrap aggregating (bagging)
• Boosting (meta-algorithm)
• Ordinal classification
• Regression analysis
• Information fuzzy networks (IFN)
• Linear classifiers
• Fisher's linear discriminant
• Logistic regression
• Naive Bayes classifier
• Perceptron
• Support vector machines
• Quadratic classifiers
• k-nearest neighbor
• Boosting
Pick an algorithm
39. ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an
imbalanced dataset!
Consider the following model:
“Always predict 2”
Trueclass
Predicted class
40. ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an
imbalanced dataset!
Consider the following model:
“Always predict 2”
Accuracy 0.9
Trueclass
Predicted class
42. DECISION TREE
“You said 100%
accurate?! Every 10th
digit your system
detects is wrong!”
Angry client
43. DECISION TREE
“You said 100%
accurate?! Every 10th
digit your system
detects is wrong!”
Angry client
We’ve trained our system on the data the client gave us. But our
system has never seen the new data the client applied it to.
And in the real life — it never will…
56. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
57. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
58. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
59. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
60. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
61. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
62. TEST SET
20%
TRAINING SET
60%
THE WHOLE DATASET
VALIDATION SET
20%
Fit various models
and parameter
combinations on this
subset
• Evaluate the
models created
with different
parameters
!
• Estimate overfitting
Use only once to get
the final performance
estimate
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
TRA
VALI
68. CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
69. CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
70. CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too
optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
}
Take average
validation score
over 10 runs —
it is a more
stable estimate.
71.
72.
73.
74. MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
75. MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
“So it is ~87%…erm…
Could you do better?”
76. MACHINE LEARNING PIPELINE
Take raw data Extract features
Split into TRAINING
and TEST
Pick an algorithm
and parameters
Train on the
TRAINING data
Evaluate on the
TRAINING data
with CV
Train on the
whole TRAINING
Fix the best
parameters
Evaluate on TEST
Report final
performance to
the client
Try our different algorithms
and parameters
“So it is ~87%…erm…
Could you do better?”
Yes
77. • C4.5
• Random forests
• Bayesian networks
• Hidden Markov models
• Artificial neural network
• Data clustering
• Expectation-maximization
algorithm
• Self-organizing map
• Radial basis function network
• Vector Quantization
• Generative topographic map
• Information bottleneck method
• IBSEAD
• Apriori algorithm
• Eclat algorithm
• FP-growth algorithm
• Single-linkage clustering
• Conceptual clustering
• K-means algorithm
• Fuzzy clustering
• Temporal difference learning
• Q-learning
• Learning Automata
• AODE
• Artificial neural network
• Backpropagation
• Naive Bayes classifier
• Bayesian network
• Bayesian knowledge base
• Case-based reasoning
• Decision trees
• Inductive logic
programming
• Gaussian process regression
• Gene expression
programming
• Group method of data
handling (GMDH)
• Learning Automata
• Learning Vector
Quantization
• Logistic Model Tree
• Decision tree
• Decision graphs
• Lazy learning
• Monte Carlo Method
• SARSA
• Instance-based learning
• Nearest Neighbor Algorithm
• Analogical modeling
• Probably approximately correct learning
(PACL)
• Symbolic machine learning algorithms
• Subsymbolic machine learning algorithms
• Support vector machines
• Random Forest
• Ensembles of classifiers
• Bootstrap aggregating (bagging)
• Boosting (meta-algorithm)
• Ordinal classification
• Regression analysis
• Information fuzzy networks (IFN)
• Linear classifiers
• Fisher's linear discriminant
• Logistic regression
• Naive Bayes classifier
• Perceptron
• Support vector machines
• Quadratic classifiers
• k-nearest neighbor
• Boosting
Pick another algorithm
78. • C4.5
• Random forests
• Bayesian networks
• Hidden Markov models
• Artificial neural network
• Data clustering
• Expectation-maximization
algorithm
• Self-organizing map
• Radial basis function network
• Vector Quantization
• Generative topographic map
• Information bottleneck method
• IBSEAD
• Apriori algorithm
• Eclat algorithm
• FP-growth algorithm
• Single-linkage clustering
• Conceptual clustering
• K-means algorithm
• Fuzzy clustering
• Temporal difference learning
• Q-learning
• Learning Automata
• AODE
• Artificial neural network
• Backpropagation
• Naive Bayes classifier
• Bayesian network
• Bayesian knowledge base
• Case-based reasoning
• Decision trees
• Inductive logic
programming
• Gaussian process regression
• Gene expression
programming
• Group method of data
handling (GMDH)
• Learning Automata
• Learning Vector
Quantization
• Logistic Model Tree
• Decision tree
• Decision graphs
• Lazy learning
• Monte Carlo Method
• SARSA
• Instance-based learning
• Nearest Neighbor Algorithm
• Analogical modeling
• Probably approximately correct learning
(PACL)
• Symbolic machine learning algorithms
• Subsymbolic machine learning algorithms
• Support vector machines
• Random Forest
• Ensembles of classifiers
• Bootstrap aggregating (bagging)
• Boosting (meta-algorithm)
• Ordinal classification
• Regression analysis
• Information fuzzy networks (IFN)
• Linear classifiers
• Fisher's linear discriminant
• Logistic regression
• Naive Bayes classifier
• Perceptron
• Support vector machines
• Quadratic classifiers
• k-nearest neighbor
• Boosting
Pick another algorithm