SlideShare uma empresa Scribd logo
1 de 41
CHAPTER 2:
Supervised Learning
Machine Learning
 Machine learning is a subset of AI, which enables the
machine to automatically learn from data, improve
performance from past experiences, and make
predictions.
 Machine learning contains a set of algorithms that work on
a huge amount of data.
 Data is fed to these algorithms to train them, and on the
basis of training, they build the model & perform a specific
task.
 These ML algorithms help to solve different business
problems like Regression, Classification, Forecasting,
Clustering, and Associations, etc.
 Based on the methods and way of learning,
machine learning is divided into mainly four types,
which are:
 Supervised Machine Learning
 Unsupervised Machine Learning
 Semi-Supervised Machine Learning
 Reinforcement Learning
Types of Machine Learning
Outline of this chapter
 Learning a Class from Examples,
 Vapnik-Chervonenkis Dimension,
 Probably Approximately Correct Learning,
 Noise,
 Learning Multiple Classes,
 Regression,
 Model Selection and Generalization,
 Dimensions of a Supervised Machine Learning Algorithm
Learning a Class from Examples
 Let us say we want to learn the class, C, of a “family
car.”
 We have a set of examples of cars, and we have a
group of people that we survey to whom we show
these cars.
 The people look at the cars and label them; the cars
that they believe are family cars are positive
examples, and the other cars are negative examples.
 Class learning is finding a description that is shared
by all the positive examples and none of the negative
examples.
 Doing this, we can make a prediction: Given a car that we
have not seen before, by checking with the description
learned, we will be able to say whether it is a family car or
not.
 Or we can do knowledge extraction: This study may be
sponsored by a car company, and the aim may be to
understand what people expect from a family car.
 After some discussions with experts in the field, let us say
that we reach the conclusion that among all features a car
may have, the features that separate a family car from
other type of cars are the price and engine power.
 These two attributes are the inputs to the class recognizer.
 Note that when we decide on this particular input
representation, we are ignoring various other attributes as
irrelevant.
 Though one may think of other attributes such as seating
capacity and color that might be important for distinguishing
among car types, we will consider only price and engine
power to keep this example simple.
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Input representation:
x1: price, x2 : engine power
 Output:
Positive (+) and negative (–) examples
 Training set for the class of a “family car.”
 Each data point corresponds to one example car, and the
coordinates of the point indicate the price and engine
power of that car.
 ‘+’ denotes a positive example of the class (a family car),
and
 ‘−’ denotes a negative example (not a family car); it is
another type of car.
 Let us denote price as the first input attribute x1 and
engine power as the second attribute x2 .
 Thus we represent each car using two numeric values
 Our training data can now be plotted in the two-dimensional
(x1, x2) space where each instance t is a data point at
coordinates and its type, namely, positive versus
negative, is given by
 After further discussions with the expert and the analysis of
the data, we may have reason to believe that for a car to be
a family car, its price and engine power should be in a
certain range.
(p1 ≤ price ≤ p2) AND (e1 ≤ engine power ≤ e2)
for suitable values of p1, p2, e1, and e2.
 The above equation fixes H, the hypothesis class from which we
believe C is drawn, namely, the set of rectangles.
 The learning algorithm then finds hypothesis the particular
hypothesis, h ∈ H, specified by a particular quadruple of (ph1, ph2,
eh1, eh2), to approximate C as closely as possible.
Training set X N
t
t
t
,r 1
}
{ 
 x
X




negative
is
if
0
positive
is
if
1
x
x
r







2
1
x
x
x
Class C
   
2
1
2
1 power
engine
AND
price e
e
p
p 



Hypothesis class H




negative
as
classifies
if
0
positive
as
classifies
if
1
)
(
x
x
x
h
h
h
Error of h on H
 
 




N
t
t
t
r
h
h
E
1
1
)
|
( x
X
S, G, and the Version Space
most specific hypothesis, S
most general hypothesis, G
h  H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)
 One possibility is to find the most specific hypothesis, S,
that is the tightest rectangle that includes all the positive
examples and none of the negative examples.
 The most general hypothesis, G, is the largest rectangle
we can draw that includes all the positive examples and
none of the negative examples.
 Any h∈H between S and G is a valid hypothesis with no
error, said to be consistent with the training set, and such h
make up the version space.
Candidate Elimination Algorithm
 The candidate elimination algorithm incrementally builds the
version space given a hypothesis space H and a set E of
examples.
 The examples are added one by one; each example
possibly shrinks the version space by removing the
hypotheses that are inconsistent with the example.
 The candidate elimination algorithm does this by updating
the general and specific boundary for each new example.
Terms Used:
Concept learning:
• Concept learning is basically learning task of the machine
(Learn by Train data)
General Hypothesis:
• Not Specifying features to learn the machine.
• G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
Specific Hypothesis:
• Specifying features to learn machine (Specific feature)
• S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of
attributes.
Version Space:
• It is intermediate of general hypothesis and Specific
hypothesis.
Algorithm:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?’
(Basically generalizing it)
Step5: If example is Negative example
Make generalize hypothesis more specific.
Example:
 Consider the dataset given below:
Sky Temperature Humid Wind Water Forest Output
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
Algorithmic steps:
Initially :
G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
S = [Null, Null, Null, Null, Null, Null]
For instance 1 :
<'sunny','warm','normal','strong','warm ','same’>
and positive output.
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
For instance 2 :
<'sunny','warm','high','strong','warm ','same’>
and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same’]
For instance 3 :
<'rainy','cold','high','strong','warm ','change’>
and negative output.
G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, 'same']]
S3 = S2
For instance 4 : <'sunny','warm','high','strong','cool','change’>
and positive output.
G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]
 At last, by synchronizing the G4 and S4 algorithm produce
the output.
Output :
 G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
 S = ['sunny','warm',?,'strong', ?, ?]
Vapnik-Chervonenkis Dimension
 Let us say we have a dataset containing N points.
 These N points can be labeled in 2N ways as positive and
negative.
 Therefore, 2N different learning problems can be defined by
N data points.
 If for any of these problems, we can find a hypothesis h∈H
that separates the positive examples from the negative,
then we say H shatters N points.
 That is, any learning problem definable by N examples can
be learned with no error by a hypothesis drawn from H.
 The maximum number of points that VC dimension can be
shattered by H is called the Vapnik-Chervonenkis (VC)
dimension of H, is denoted as VC(H), and measures the
capacity of H.
VC Dimension
 N points can be labeled in 2N ways as +/–
 H shatters N if there
exists h  H consistent
for any of these:
VC(H ) = N
An axis-aligned rectangle shatters 4 points only !
 How many training examples N should we have, such that with
probability at least 1 ‒ δ, h has error at most ε ?
(Blumer et al., 1989)
 Each strip is at most ε/4
 Pr that we miss a strip 1‒ ε/4
 Pr that N instances miss a strip (1 ‒ ε/4)N
 Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)
Probably Approximately Correct
(PAC) Learning
Noise
 Noise is any
unwanted
anomaly in the
data and due to
noise, the class
may be more
difficult to learn
and zero error
may be
infeasible with a
simple
hypothesis
class. (see
figure 2.8)
 There are several interpretations of noise:
 There may be imprecision in recording the input attributes, which
may shift the data points in the input space.
 There may be errors in labeling the data points, which may relabel
positive instances as negative and vice versa. This is sometimes
called teacher noise.
 There may be additional attributes, which we have not taken into
account, that affect the label of an instance. Such attributes may
be hidden or latent in that they may be unobservable. The effect of
these neglected attributes is thus modeled as a random
component and is included in “noise.”
 As can be seen in figure 2.8, when there is noise, there is
not a simple boundary between the positive and negative
instances and to separate them, one needs a complicated
hypothesis that corresponds to a hypothesis class with
larger capacity.
 A rectangle can be defined by four numbers, but to define
a more complicated shape one needs a more complex
model with a much larger number of parameters.
 With a complex model one can make a perfect fit to the
data and attain zero error; see the wiggly shape in figure
2.8.
31
Use the simpler one because
 Simpler to use
(lower computational
complexity)
 Easier to train (lower
space complexity)
 Easier to explain
(more interpretable)
 Generalizes better (lower
variance - Occam’s razor)
Noise and Model Complexity
Learning Multiple Classes
 In our example of learning a family car, we have positive
examples belonging to the class family car and the
negative examples belonging to all other cars.
 This is a two-class problem.
 In the general case, we have K classes denoted as Ci, i =
1, . . . , K, and an input instance belongs to one and
exactly one of them.
Multiple Classes, Ci i=1,...,K
N
t
t
t
,r 1
}
{ 
 x
X









,
if
0
if
1
i
j
r
j
t
i
t
t
i
C
C
x
x
Train hypotheses
hi(x), i =1,...,K:
 









,
if
0
if
1
i
j
h
j
t
i
t
t
i
C
C
x
x
x
 In machine learning for classification, we would like to
learn the boundary separating the instances of one class
from the instances of all other classes.
 Thus we view a K-class classification problem as K two-
class problems.
 The training examples belonging to Ci are the positive
instances of hypothesis hi and the examples of all other
classes are the negative instances of hi.
 The total empirical error takes a sum over the predictions
for all classes over all instances:
Regression
 In classification, given an input, the output that is
generated is Boolean; it is a yes/no answer.
 When the output is a numeric value, what we would like
to learn is not a class, C(x) ∈ {0, 1}, but is a numeric
function.
 In machine learning, the function is not known but we
have a training set of examples drawn from it
Regression   0
1 w
x
w
x
g 

  0
1
2
2 w
x
w
x
w
x
g 


   
 




N
t
t
t
x
g
r
N
g
E
1
2
1
| X
   
 





N
t
t
t
w
x
w
r
N
w
,
w
E
1
2
0
1
0
1
1
| X
 
  




 
t
t
t
N
t
t
t
x
f
r
r
r
x 1
,
X
Model Selection & Generalization
 Learning is an ill-posed problem; data is not sufficient to
find a unique solution
 The need for inductive bias, assumptions about H
 Generalization: How well a model performs on new data
 Overfitting: H more complex than C or f
 Underfitting: H less complex than C or f
Triple Trade-Off
 There is a trade-off between three factors (Dietterich,
2003):
1. Complexity of H, c (H),
2. Training set size, N,
3. Generalization error, E, on new data
 As N, E
 As c (H), first E and then E
Cross-Validation
 To estimate generalization error, we need data unseen
during training. We split the data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
 Resampling when there is few data
Dimensions of a Supervised
Learner
1. Model :
2. Loss function:
3. Optimization procedure:
 

|
x
g
   
 
 


t
t
t
g
,
r
L
E |
| x
X
 
X
|
min
arg 



E
*

Mais conteúdo relacionado

Semelhante a Supervised_Learning.ppt

Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_ASrimatre K
 
Lecture 3 (Supervised learning)
Lecture 3 (Supervised learning)Lecture 3 (Supervised learning)
Lecture 3 (Supervised learning)VARUN KUMAR
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inferencebutest
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 
Candidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabCandidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabVenkateswaraBabuRavi
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.pptbutest
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 

Semelhante a Supervised_Learning.ppt (20)

Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
ppt
pptppt
ppt
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
Lecture 3 (Supervised learning)
Lecture 3 (Supervised learning)Lecture 3 (Supervised learning)
Lecture 3 (Supervised learning)
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
Midterm
MidtermMidterm
Midterm
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
Candidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabCandidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML Lab
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
ML_lec1.pdf
ML_lec1.pdfML_lec1.pdf
ML_lec1.pdf
 

Último

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 

Último (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 

Supervised_Learning.ppt

  • 2. Machine Learning  Machine learning is a subset of AI, which enables the machine to automatically learn from data, improve performance from past experiences, and make predictions.  Machine learning contains a set of algorithms that work on a huge amount of data.  Data is fed to these algorithms to train them, and on the basis of training, they build the model & perform a specific task.  These ML algorithms help to solve different business problems like Regression, Classification, Forecasting, Clustering, and Associations, etc.
  • 3.  Based on the methods and way of learning, machine learning is divided into mainly four types, which are:  Supervised Machine Learning  Unsupervised Machine Learning  Semi-Supervised Machine Learning  Reinforcement Learning Types of Machine Learning
  • 4.
  • 5. Outline of this chapter  Learning a Class from Examples,  Vapnik-Chervonenkis Dimension,  Probably Approximately Correct Learning,  Noise,  Learning Multiple Classes,  Regression,  Model Selection and Generalization,  Dimensions of a Supervised Machine Learning Algorithm
  • 6. Learning a Class from Examples  Let us say we want to learn the class, C, of a “family car.”  We have a set of examples of cars, and we have a group of people that we survey to whom we show these cars.  The people look at the cars and label them; the cars that they believe are family cars are positive examples, and the other cars are negative examples.  Class learning is finding a description that is shared by all the positive examples and none of the negative examples.
  • 7.  Doing this, we can make a prediction: Given a car that we have not seen before, by checking with the description learned, we will be able to say whether it is a family car or not.  Or we can do knowledge extraction: This study may be sponsored by a car company, and the aim may be to understand what people expect from a family car.  After some discussions with experts in the field, let us say that we reach the conclusion that among all features a car may have, the features that separate a family car from other type of cars are the price and engine power.
  • 8.  These two attributes are the inputs to the class recognizer.  Note that when we decide on this particular input representation, we are ignoring various other attributes as irrelevant.  Though one may think of other attributes such as seating capacity and color that might be important for distinguishing among car types, we will consider only price and engine power to keep this example simple.
  • 9.  Class C of a “family car”  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car?  Input representation: x1: price, x2 : engine power  Output: Positive (+) and negative (–) examples
  • 10.  Training set for the class of a “family car.”  Each data point corresponds to one example car, and the coordinates of the point indicate the price and engine power of that car.  ‘+’ denotes a positive example of the class (a family car), and  ‘−’ denotes a negative example (not a family car); it is another type of car.  Let us denote price as the first input attribute x1 and engine power as the second attribute x2 .  Thus we represent each car using two numeric values
  • 11.  Our training data can now be plotted in the two-dimensional (x1, x2) space where each instance t is a data point at coordinates and its type, namely, positive versus negative, is given by  After further discussions with the expert and the analysis of the data, we may have reason to believe that for a car to be a family car, its price and engine power should be in a certain range. (p1 ≤ price ≤ p2) AND (e1 ≤ engine power ≤ e2) for suitable values of p1, p2, e1, and e2.  The above equation fixes H, the hypothesis class from which we believe C is drawn, namely, the set of rectangles.  The learning algorithm then finds hypothesis the particular hypothesis, h ∈ H, specified by a particular quadruple of (ph1, ph2, eh1, eh2), to approximate C as closely as possible.
  • 12. Training set X N t t t ,r 1 } {   x X     negative is if 0 positive is if 1 x x r        2 1 x x x
  • 13. Class C     2 1 2 1 power engine AND price e e p p    
  • 14. Hypothesis class H     negative as classifies if 0 positive as classifies if 1 ) ( x x x h h h Error of h on H         N t t t r h h E 1 1 ) | ( x X
  • 15. S, G, and the Version Space most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent and make up the version space (Mitchell, 1997)
  • 16.  One possibility is to find the most specific hypothesis, S, that is the tightest rectangle that includes all the positive examples and none of the negative examples.  The most general hypothesis, G, is the largest rectangle we can draw that includes all the positive examples and none of the negative examples.  Any h∈H between S and G is a valid hypothesis with no error, said to be consistent with the training set, and such h make up the version space.
  • 17. Candidate Elimination Algorithm  The candidate elimination algorithm incrementally builds the version space given a hypothesis space H and a set E of examples.  The examples are added one by one; each example possibly shrinks the version space by removing the hypotheses that are inconsistent with the example.  The candidate elimination algorithm does this by updating the general and specific boundary for each new example.
  • 18. Terms Used: Concept learning: • Concept learning is basically learning task of the machine (Learn by Train data) General Hypothesis: • Not Specifying features to learn the machine. • G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes Specific Hypothesis: • Specifying features to learn machine (Specific feature) • S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of attributes. Version Space: • It is intermediate of general hypothesis and Specific hypothesis.
  • 19. Algorithm: Step1: Load Data set Step2: Initialize General Hypothesis and Specific Hypothesis. Step3: For each training example Step4: If example is positive example if attribute_value == hypothesis_value: Do nothing else: replace attribute value with '?’ (Basically generalizing it) Step5: If example is Negative example Make generalize hypothesis more specific.
  • 20. Example:  Consider the dataset given below: Sky Temperature Humid Wind Water Forest Output Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes
  • 21. Algorithmic steps: Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]] S = [Null, Null, Null, Null, Null, Null] For instance 1 : <'sunny','warm','normal','strong','warm ','same’> and positive output. G1 = G S1 = ['sunny','warm','normal','strong','warm ','same']
  • 22. For instance 2 : <'sunny','warm','high','strong','warm ','same’> and positive output. G2 = G S2 = ['sunny','warm',?,'strong','warm ','same’] For instance 3 : <'rainy','cold','high','strong','warm ','change’> and negative output. G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']] S3 = S2
  • 23. For instance 4 : <'sunny','warm','high','strong','cool','change’> and positive output. G4 = G3 S4 = ['sunny','warm',?,'strong', ?, ?]  At last, by synchronizing the G4 and S4 algorithm produce the output. Output :  G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]  S = ['sunny','warm',?,'strong', ?, ?]
  • 24. Vapnik-Chervonenkis Dimension  Let us say we have a dataset containing N points.  These N points can be labeled in 2N ways as positive and negative.  Therefore, 2N different learning problems can be defined by N data points.  If for any of these problems, we can find a hypothesis h∈H that separates the positive examples from the negative, then we say H shatters N points.  That is, any learning problem definable by N examples can be learned with no error by a hypothesis drawn from H.
  • 25.  The maximum number of points that VC dimension can be shattered by H is called the Vapnik-Chervonenkis (VC) dimension of H, is denoted as VC(H), and measures the capacity of H.
  • 26. VC Dimension  N points can be labeled in 2N ways as +/–  H shatters N if there exists h  H consistent for any of these: VC(H ) = N An axis-aligned rectangle shatters 4 points only !
  • 27.  How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989)  Each strip is at most ε/4  Pr that we miss a strip 1‒ ε/4  Pr that N instances miss a strip (1 ‒ ε/4)N  Pr that N instances miss 4 strips 4(1 ‒ ε/4)N  4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)  4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ) Probably Approximately Correct (PAC) Learning
  • 28. Noise  Noise is any unwanted anomaly in the data and due to noise, the class may be more difficult to learn and zero error may be infeasible with a simple hypothesis class. (see figure 2.8)
  • 29.  There are several interpretations of noise:  There may be imprecision in recording the input attributes, which may shift the data points in the input space.  There may be errors in labeling the data points, which may relabel positive instances as negative and vice versa. This is sometimes called teacher noise.  There may be additional attributes, which we have not taken into account, that affect the label of an instance. Such attributes may be hidden or latent in that they may be unobservable. The effect of these neglected attributes is thus modeled as a random component and is included in “noise.”
  • 30.  As can be seen in figure 2.8, when there is noise, there is not a simple boundary between the positive and negative instances and to separate them, one needs a complicated hypothesis that corresponds to a hypothesis class with larger capacity.  A rectangle can be defined by four numbers, but to define a more complicated shape one needs a more complex model with a much larger number of parameters.  With a complex model one can make a perfect fit to the data and attain zero error; see the wiggly shape in figure 2.8.
  • 31. 31 Use the simpler one because  Simpler to use (lower computational complexity)  Easier to train (lower space complexity)  Easier to explain (more interpretable)  Generalizes better (lower variance - Occam’s razor) Noise and Model Complexity
  • 32. Learning Multiple Classes  In our example of learning a family car, we have positive examples belonging to the class family car and the negative examples belonging to all other cars.  This is a two-class problem.  In the general case, we have K classes denoted as Ci, i = 1, . . . , K, and an input instance belongs to one and exactly one of them.
  • 33. Multiple Classes, Ci i=1,...,K N t t t ,r 1 } {   x X          , if 0 if 1 i j r j t i t t i C C x x Train hypotheses hi(x), i =1,...,K:            , if 0 if 1 i j h j t i t t i C C x x x
  • 34.  In machine learning for classification, we would like to learn the boundary separating the instances of one class from the instances of all other classes.  Thus we view a K-class classification problem as K two- class problems.  The training examples belonging to Ci are the positive instances of hypothesis hi and the examples of all other classes are the negative instances of hi.  The total empirical error takes a sum over the predictions for all classes over all instances:
  • 35. Regression  In classification, given an input, the output that is generated is Boolean; it is a yes/no answer.  When the output is a numeric value, what we would like to learn is not a class, C(x) ∈ {0, 1}, but is a numeric function.  In machine learning, the function is not known but we have a training set of examples drawn from it
  • 36.
  • 37. Regression   0 1 w x w x g     0 1 2 2 w x w x w x g              N t t t x g r N g E 1 2 1 | X            N t t t w x w r N w , w E 1 2 0 1 0 1 1 | X            t t t N t t t x f r r r x 1 , X
  • 38. Model Selection & Generalization  Learning is an ill-posed problem; data is not sufficient to find a unique solution  The need for inductive bias, assumptions about H  Generalization: How well a model performs on new data  Overfitting: H more complex than C or f  Underfitting: H less complex than C or f
  • 39. Triple Trade-Off  There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E, on new data  As N, E  As c (H), first E and then E
  • 40. Cross-Validation  To estimate generalization error, we need data unseen during training. We split the data as  Training set (50%)  Validation set (25%)  Test (publication) set (25%)  Resampling when there is few data
  • 41. Dimensions of a Supervised Learner 1. Model : 2. Loss function: 3. Optimization procedure:    | x g           t t t g , r L E | | x X   X | min arg     E *