SlideShare a Scribd company logo
1 of 46
FEATURE SELECTION
Presented by Priagung Khusumanegara
Prof. Soo-Hyung Kim
OUTLINES
• CLASS SEPARABILITY MEASURES
 Divergence
 Bhattacharyya Distance and Chernoff Bound
 Measures Based on Scatter Matrices
• FEATURE SUBSET SELECTION
 Scalar Feature Selection
 Feature Vector Selection
CLASS SEPARABILITY MEASURES
This section focus on combinations of features (i.e., feature
vectors) and describe measures that quantify class separability
in the respective feature space.
Three class-separability measures are considered:
1. Divergence
2. Bhattacharyya Distance and Chernoff Bound
3. Scatter matrices
Divergence
Based on a measure of difference between pairs of classes. The
divergence between them is defined as
Where, Si is the covariance matrix; mi , i = 1,2 is the respective mean
vector of the I th class; and I is the l × l identity matrix.
Example 4.7.1
We consider two classes and assume the features to be independent
and normally distributed in each one. Specifically, the first (second)
class is modelled by the Gaussian distribution.
m1 = [3,3]T , S1 = 0.2I
m2 = [2.3,2.3]T , S2 = 1.9I
Generate 100 data points from each class. Then compute the
divergence between the two classes and plot the data.
Matlab Code
% 1. Generate data for the two classes
m1=[3 3]';S1=0.2*eye(2);
m2=[2.3 2.3]';S2=1.9*eye(2);
randn('seed',0);
class1=mvnrnd(m1,S1,100)';
randn('seed',100);
class2=mvnrnd(m2,S2,100)';
% 2. Compute the divergence. Employ the divergence function
Distance=divergence(class1,class2)
% Use function plotData to plot the data
featureNames={'1','2'};
plotData(class1,class2,1:2,featureNames);
Result
Divergence Distance = 5.7233
-2 -1 0 1 2 3 4 5 6
-3
-2
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
In fact that the mean values of the
two classes are very close, but the
variances of the features differ
significantly, so the divergence
measure results in a large value.
Bhattacharyya Distance and Chernoff Bound
• The Bhattacharyya distance bears a close relationship to the error of
the Bayesian classifier. The Bhattacharyya distance B1,2 between two
classes is defined as
Where, and |.| denotes the determinant of the
respective matrix.
• The Chernoff bound is an upper bound of the optimal Bayesian error.
where P(ω1),P(ω2 ) are the a priori class probabilities
Example 4.7.2
Consider a 2-class 2-dimensional classification problem with two
equiprobable classes ω1 and ω2 modeled by the Gaussian distributions
N (m1,S1) and N (m2,S2), respectively, where
m1 = [3,3]T , S1 = 0.2I
m2 = [2.3,2.3]T , S2 = 1.9I
This is the case of small differences between mean values and large
differences between variances. Compute the Bhattacharyya distance
and the Chernoff bound between the two classes.
Matlab Code
% 1. Generate data for the two classes
m1=[3 3]';S1=0.2*eye(2);
m2=[2.2 2.2]';S2=1.9*eye(2);
randn('seed',0);
class1=mvnrnd(m1,S1,100)';
randn('seed',100);
class2=mvnrnd(m2,S2,100)';
% 2. Compute the Bhattacharyya distance and the Chernoff bound
BhattaDistance=divergenceBhata (class1,class2)
ChernoffBound=0.5*exp(-BhattaDistance)
% Use function plotData to plot the data
featureNames={'1','2'};
plotData(class1,class2,1:2,featureNames);
Result
• Bhattacharyya Distance = 0.3730
• Chernoff Bound = 0.3443
Another Case
m1 = [1,1]T, m2 = [4,4]T ,S1 = 0.2I, S2 = 0.2I
• Bhattacharyya Distance = 5.9082
• Chernoff Bound = 0.0014
-2 -1 0 1 2 3 4 5 6
-3
-2
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
-1 0 1 2 3 4 5 6
-1
0
1
2
3
4
5
6
Feature: 1
Feature:2
Measures Based on Scatter Matrices
Scatter matrices : measures for quantifying the way feature vectors
“scatter” in the feature space. Three such measures are the following:
Where, Sw is the within-class scatter matrix, Sb is the between-class
scatter matrix and Sm is mixture scatter matrix
1. Within-class scatter matrix (Sw )
Average feature variance per class
2. Between-class scatter matrix (Sb)
Average variance of class means vs. global mean
3. Mixture scatter matrix (Sm)
Feature covariance with respect to global mean
Large values of J1, J2, and J3 indicate that data points in the respective
feature space have small within-class variance and large between-class
distance.
Example 4.7.3
In this example, the previously defined J3 measure will be used to
choose the best l features out of m > l originally generated features. To
be more realistic, we will consider Example 4.6.2, where four features
(mean, standard deviation, skewness, and kurtosis) were used. From
Table 4.3 we have ten values for each feature and for each class. The
goal is to select three out of the four features that will result in the
best J3 value.
1. Normalize the values of each feature to have zero mean and unit
variance.
2. Select three out of the four features that result in the best J3 value.
3. Plot the data points in the 3-dimensional space for the best feature
combination.
Matlab Code
% 1. Load files “cirrhoticLiver.dat” and “fattyLiver.dat”
class1=load('cirrhoticLiver.dat')';
class2=load('fattyLiver.dat')';
% Normalize features
[class1,class2]=normalizeStd(class1,class2);
% 2. Evaluate J3 for the three-feature combination [1, 2, 3], where 1 stands for the
mean, 2 for std, and so on
[J]=ScatterMatrices(class1([1 2 3],:),class2([1 2 3],:))
% 3. Plot the results of the feature combination [1, 2, 3]. Use function plotData
featureNames = {'mean','standart dev','skewness','kurtosis'};
plotData(class1,class2,[1 2 3],featureNames);
Result
J [1 2 3] = 3.9742
Another Result
J [1 2 4] = 3.9741
J [1 3 4] = 3.6195
J [2 3 4] = 1.3972
J [1 2 3] Indicate that data points in the respective feature space have small within-
class variance and large between-class distance.
Feature Subset Selection
• Problem
Select a subset of l features out of m originally available, with the
goal of maximizing class separation.
• Approaches:
1. Scalar Feature Selection : treat feature individually (ignores
feature correlations)
2. Feature vector Selection: consider feature sets and feature
correlation
Scalar Feature Selection
Procedure:
1. Select a class separability criterion C and compute its
values for all the available features xk, k 1, 2, . . . ,m. Rank
them in descending order and choose the one with the
best C value. Let us say that this is xi1 .
2. To select the second feature, compute the cross-correlation
coefficient between the chosen xi1 and each of the
remaining m1 features.
3. Choose the feature xi2 for which
where a1, a2 are weighting factors that determine the relative
importance we give to the two terms
4. Select xik , k 3, . . . , l, so that
That is, the average correlation with all previously selected features is
taken into account
Example 4.8.1
This example demonstrates the ranking of a given number of features
according to their class-discriminatory power. We will use the same data as
in Example 4.6.2. Recall that there are two classes and four features (mean,
standard deviation, skewness, and kurtosis); their respective values for the
two classes are summarized in Table 4.3.
1. Normalize the features so as to have zero mean and unit variance. Then
use the FDR criterion to rank the features by considering each one
independently (see Example 4.6.2).
2. Use the scalar feature selection technique described before to rank the
features. Use the FDR in place of the criterion C. Set a1 = 0.2 and a2 =
0.8.
3. Comment on the results
Matlab Code
Result
Scalar Feature Ranking
1. (mean)
2. (st. dev.)
3. (skewness)
4. (kurtosis)
Scalar Feature Ranking with correlation
1. (mean)
2. (kurtosis)
3. (st. dev.)
4. (skewness)
The rank order provided by the FDR class-
separability criterion differs from the ranking that
results when both the FDR and the cross-
correlations are considered. It must be emphasized
that different values for coefficients a1 and a2 may
change the ranking.
Feature Vector Selection
Feature vector selection: consider feature sets and feature correlation.
The goal now is to find the “best” combination of features.
Techniques:
• Exhaustive search
• Sequential forward and backward selection
• Forward floating search selection
Exhaustive search
According to this technique, all possible combinations will be
“exhaustively” formed and for each combination its class separability
will be computed. Different class separability measures will be used.
Example 4.8.2
We will demonstrate the procedure for selecting the “best” feature
combination in the context of computer-aided diagnosis in mammography.
In digital X-ray breast imaging, micro calcifications are concentrations of
small white spots representing calcium deposits in breast tissue (see Figure
4.6). Often they are assessed as a noncancerous sign, but depending on their
shape and concentration patterns, they may indicate cancer. Their
significance in breast cancer diagnosis tasks makes it important to have a
pattern recognition system that detects the presence of micro calcifications
in a mammogram. To this end, a procedure similar to the one used in
Example 4.6.2 will be adopted here. Four features are derived for each ROI:
mean, standard deviation, skewness, and kurtosis. Figure 4.6 shows the
selected ROIs for the two classes (10 for normal tissue; 11 for abnormal
tissue with micro calcifications).
1. Employ the exhaustive search method to select the best
combination of three features out of the four previously
mentioned, according to the divergence, the Bhattacharyya
distance, and the J3 measure associated with the scatter matrices.
2. Plot the data in the subspace spanned by the features selected in
step 1.
Matlab Code
% 1. Load files breastMicrocalcifications.dat and breastNormalTissue.dat
class1=load('breastMicrocalcifications.dat')';
class2=load('breastNormalTissue.dat')';
% Normalize features so that all of them have zero mean and unit variance
[class1,class2]=normalizeStd(class1,class2);
% Select the best combination of features using the J3 criterion
costFunction='ScatterMatrices';
[cLbest,Jmax]=exhaustiveSearch(class1,class2, costFunction,[3])
% 2. Form the classes c1 and c2 using the best feature combination
c1 = class1(cLbest,:);
c2 = class2(cLbest,:);
% Plot the data using the plotData function
featureNames = {'mean','st. dev.','skewness','kurtosis'};
plotData(c1,c2,cLbest,featureNames);
Result
cLbest =
1 2 3
Jmax =
1.9625 -3
-2
-1
0
1
2
-2
0
2
4
-2
-1
0
1
2
3
Feature: meanFeature: st. dev.
Feature:skewness
Suboptimal Searching Techniques
Backward Selection
1. Start with all features in a vector (m features)
2. Interactively eliminate one feature, compute class separability
criterion
3. Keep combination with the highest criterion value
4. Repeat with chosen combination until we have a vector of size l
Number of Combinations Generated
1 + ½ ((m +1)m - l (l +1))
Sequential Forward Selection
1. Compute criterion value for each feature
2. Select feature with best value
3. Form all possible pairings of best vector with another unused feature
4. Evaluate each using the criterion, select best vector
5. Repeat step 3 until we have a vector of size l
Number of Combinations Generated
lm- l(l- 1)/2
Forward Floating Search Selection
The idea of forward floating search selection: we to add a feature
(forward), check smaller feature sets to see if we do better with this
feature replacing a previously selected feature (backward). Terminate
when k features selected.
Example 4.8.4
The goal of this example is to demonstrate the (possible) differences
in accuracy between the suboptimal feature selection techniques SFS,
SBS, and SFFS and the optimal exhaustive search technique, which
will be considered as the “gold standard.”
We assume that we are working in a 2-dimensional feature space, and
we evaluate the performance of the four feature selection methods in
determining the best 2-feature combination, employing the functions
exhaustive Search, Sequential Forward Selection, Sequential Backward
Selection, and Sequential Forward Floating Selection.
Matlab Code
Result
Exhaustive Search -> Best of two: (1) (6)
Sequential Forward Selection -> Best of two: (1) (5)
Sequential Backward Selection -> Best of two: (2) (9)
Floating Search Method -> Best of two: (1) (6)
Example 4.8.4. Designing a Classification
System
The goal of this example is to demonstrate the various stages involved
in the design of a classification system. We will adhere to the 2-class
texture classification task of Figure 4.8. The following five stages will be
explicitly considered:
• Data collection for training and testing
• Feature generation
• Feature selection
• Classifier design
• Performance evaluation
Data collection for training and testing
• A number of different images must be chosen for both the training
and the test set. To form the training data set, 25 ROIs are selected
from each image representing the two classes
Feature generation
• From each of the selected ROIs, a number of features is generated.
We have decided on 20 different texture-related features.
• Feature selection
In this case, we use scalar feature selection (function composite
Features Ranking), which employs the FDR criterion as well as a cross-
correlation measure between pairs of features in order to rank them.
• Classifier design
In this stage different classifiers are employed in the selected feature
space; the one that results in the best performance is chosen. In this
example we use the k-nearest neighbour (k-NN) classifier.
• Performance evaluation
The performance of the classifier, in terms of its error rate, is
measured against the test data set. However, in order to cover the
case where the same data must be used for both training and testing,
the leave-one-out (LOO) method will be employed. LOO is particularly
useful in cases where only a limited data set is available.
Matlab Code
% 1. Read the training data and normalize the feature values.
c1_train=load('trainingClass1.dat')';
c2_train=load('trainingClass2.dat')';
% Normalize dataset
superClass=[c1_train c2_train];
for i=1:size(superClass,1)
m(i)=mean(superClass(i,:)); % mean value of feature
s(i)=std (superClass(i,:)); % std of feature
superClass(i,:)=(superClass(i,:)-m(i))/s(i);
end
c1_train=superClass(:,1:size(c1_train,2));
c2_train=superClass(:,size(c1_train,2)+1:size(superClass,2));
% 2. Rank the features using the normalized training dataset. We have adopted
% the scalar feature ranking technique which employs Fisher’s Discrimination Ratio
% in conjunction with a feature correlation measure. The ranking results are returned in
variable p.
[T]=ScalarFeatureSelectionRanking(c1_train,c2_train,'Fisher');
[p]= compositeFeaturesRanking (c1_train,c2_train,0.2,0.8,T);
% 4. Choose the best feature combination consisting of three features (out of the previously selected
seven), using the exhaustive search method.
• [cLbest,Jmax]= exhaustiveSearch(c1_train,c2_train,'ScatterMatrices',[3]);
% 5. Form the resulting training dataset (using the best feature combination), along with the
corresponding class labels.
trainSet=[c1_train c2_train];
trainSet=trainSet(cLbest,:);
trainLabels=[ones(1,size(c1_train,2)) 2*ones(1,size(c2_train,2))];
% 6. Load the test dataset and normalize using the mean and std (computed over the training dataset)
Form the vector of the corresponding test labels.
c1_test=load('testClass1.dat')';
c2_test=load('testClass2.dat')';
for i=1:size(c1_test,1)
c1_test(i,:)=(c1_test(i,:)-m(i))/s(i);
c2_test(i,:)=(c2_test(i,:)-m(i))/s(i);
end
c1_test=c1_test(inds,:);
c2_test=c2_test(inds,:);
testSet=[c1_test c2_test];
testSet=testSet(cLbest,:);
testLabels=[ones(1,size(c1_test,2)) 2*ones(1,size(c2_test,2))];
% 7. Plot the test dataset by means of function plotData Provide names for the features
featureNames={'mean','stand dev','skewness','kurtosis','Contrast 0','Contrast 90','Contrast
45','Contrast 135','Correlation 0','Correlation 90','Correlation 45','Correlation 135','Energy
0','Energy 90','Energy 45','Energy 135','Homogeneity 0','Homogeneity 90','Homogeneity
45','Homogeneity 135'};
fNames=featureNames(inds);
fNames=fNames(cLbest);
plotData(c1_test(cLbest,:),c2_test(cLbest,:),1:3,fNames);
% 8. Classify the feature vectors of the test data using the k-NN classifier
[classified]=k_nn_classifier(trainSet,trainLabels,3,testSet);
[classif_error]=compute_error(testLabels,classified)
% Leave-one-out (LOO) method (a) Load all training data, normalize them and create class labels.
c1_train=load('trainingClass1.dat')';
c1=c1_train;
Labels1=ones(1,size(c1,2));
c2_train=load('trainingClass2.dat')';
c2=c2_train;
Labels2=2*ones(1,size(c2,2));
[c1,c2]=normalizeStd(c1,c2);
AllDataset=[c1 c2];
Labels=[Labels1 Labels2];
% (b) Keep features of the best feature combination (determined previously) and discard all the
rest.
AllDataset=AllDataset(inds,:);
AllDataset=AllDataset(cLbest,:);
% (c) Apply the Leave One Out method on the k-NN classifier (k = 3) and compute the error.
[M,N]=size(AllDataset);
for i=1:N
dec(i)=k_nn_classifier([AllDataset(:,1:i-1) AllDataset(:,i+1:N)], [Labels(1,1:i-1)
Labels(1,i+1:N)],3,AllDataset(:,i));
end
LOO_error=sum((dec~=Labels))/N
Result
Classification Error: 0.0200
The leave-one-out (LOO) error is 1%.
Conclusion
Feature Selection: select the most effective features
according to given optimal criterion (ranking the
features according to their contribution to recognition
results)
Reference
• Sergios Theodoridis, Konstantinos Koutroumbas: An Introduction to
Pattern Recognition : A MATLAB Approach

More Related Content

What's hot

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 

What's hot (20)

2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Distance function
Distance functionDistance function
Distance function
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Super resolution
Super resolutionSuper resolution
Super resolution
 

Similar to Feature Selection

Data Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1TasData Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1Tas
OllieShoresna
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
Yaxin Liu
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
sipij
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
HariniMS1
 

Similar to Feature Selection (20)

A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...A comparative study of dimension reduction methods combined with wavelet tran...
A comparative study of dimension reduction methods combined with wavelet tran...
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersA New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
 
Data Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1TasData Mining and Neural NetworksComputational Task 1Tas
Data Mining and Neural NetworksComputational Task 1Tas
 
Lda
LdaLda
Lda
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Shriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa NaikShriram Nandakumar & Deepa Naik
Shriram Nandakumar & Deepa Naik
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate Classifier
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 

More from Lippo Group Digital

Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 

More from Lippo Group Digital (12)

Behavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs DataBehavior-Based Authentication System Based on Smartphone Life-Logs Data
Behavior-Based Authentication System Based on Smartphone Life-Logs Data
 
Domain specific IoT
Domain specific IoTDomain specific IoT
Domain specific IoT
 
Fall detection
Fall detectionFall detection
Fall detection
 
The Cognitive Net is Coming
The Cognitive Net is ComingThe Cognitive Net is Coming
The Cognitive Net is Coming
 
The future internet web 3.0
The future internet  web 3.0The future internet  web 3.0
The future internet web 3.0
 
A web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guideA web based iptv content syndication system for personalized content guide
A web based iptv content syndication system for personalized content guide
 
Time-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN ControllerTime-based DDoS Detection and Mitigation for SDN Controller
Time-based DDoS Detection and Mitigation for SDN Controller
 
Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)Caching in Information Centric Network (ICN)
Caching in Information Centric Network (ICN)
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
Analisis performa kecepatan mapreduce pada hadoop menggunakan tcp packet flow...
 
Analisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatanAnalisa pengaruh block size pada hdfs terhadap kecepatan
Analisa pengaruh block size pada hdfs terhadap kecepatan
 

Recently uploaded

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Feature Selection

  • 1. FEATURE SELECTION Presented by Priagung Khusumanegara Prof. Soo-Hyung Kim
  • 2. OUTLINES • CLASS SEPARABILITY MEASURES  Divergence  Bhattacharyya Distance and Chernoff Bound  Measures Based on Scatter Matrices • FEATURE SUBSET SELECTION  Scalar Feature Selection  Feature Vector Selection
  • 3. CLASS SEPARABILITY MEASURES This section focus on combinations of features (i.e., feature vectors) and describe measures that quantify class separability in the respective feature space. Three class-separability measures are considered: 1. Divergence 2. Bhattacharyya Distance and Chernoff Bound 3. Scatter matrices
  • 4. Divergence Based on a measure of difference between pairs of classes. The divergence between them is defined as Where, Si is the covariance matrix; mi , i = 1,2 is the respective mean vector of the I th class; and I is the l × l identity matrix.
  • 5. Example 4.7.1 We consider two classes and assume the features to be independent and normally distributed in each one. Specifically, the first (second) class is modelled by the Gaussian distribution. m1 = [3,3]T , S1 = 0.2I m2 = [2.3,2.3]T , S2 = 1.9I Generate 100 data points from each class. Then compute the divergence between the two classes and plot the data.
  • 6. Matlab Code % 1. Generate data for the two classes m1=[3 3]';S1=0.2*eye(2); m2=[2.3 2.3]';S2=1.9*eye(2); randn('seed',0); class1=mvnrnd(m1,S1,100)'; randn('seed',100); class2=mvnrnd(m2,S2,100)'; % 2. Compute the divergence. Employ the divergence function Distance=divergence(class1,class2) % Use function plotData to plot the data featureNames={'1','2'}; plotData(class1,class2,1:2,featureNames);
  • 7. Result Divergence Distance = 5.7233 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2 In fact that the mean values of the two classes are very close, but the variances of the features differ significantly, so the divergence measure results in a large value.
  • 8. Bhattacharyya Distance and Chernoff Bound • The Bhattacharyya distance bears a close relationship to the error of the Bayesian classifier. The Bhattacharyya distance B1,2 between two classes is defined as Where, and |.| denotes the determinant of the respective matrix.
  • 9. • The Chernoff bound is an upper bound of the optimal Bayesian error. where P(ω1),P(ω2 ) are the a priori class probabilities
  • 10. Example 4.7.2 Consider a 2-class 2-dimensional classification problem with two equiprobable classes ω1 and ω2 modeled by the Gaussian distributions N (m1,S1) and N (m2,S2), respectively, where m1 = [3,3]T , S1 = 0.2I m2 = [2.3,2.3]T , S2 = 1.9I This is the case of small differences between mean values and large differences between variances. Compute the Bhattacharyya distance and the Chernoff bound between the two classes.
  • 11. Matlab Code % 1. Generate data for the two classes m1=[3 3]';S1=0.2*eye(2); m2=[2.2 2.2]';S2=1.9*eye(2); randn('seed',0); class1=mvnrnd(m1,S1,100)'; randn('seed',100); class2=mvnrnd(m2,S2,100)'; % 2. Compute the Bhattacharyya distance and the Chernoff bound BhattaDistance=divergenceBhata (class1,class2) ChernoffBound=0.5*exp(-BhattaDistance) % Use function plotData to plot the data featureNames={'1','2'}; plotData(class1,class2,1:2,featureNames);
  • 12. Result • Bhattacharyya Distance = 0.3730 • Chernoff Bound = 0.3443 Another Case m1 = [1,1]T, m2 = [4,4]T ,S1 = 0.2I, S2 = 0.2I • Bhattacharyya Distance = 5.9082 • Chernoff Bound = 0.0014 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2 -1 0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 Feature: 1 Feature:2
  • 13. Measures Based on Scatter Matrices Scatter matrices : measures for quantifying the way feature vectors “scatter” in the feature space. Three such measures are the following: Where, Sw is the within-class scatter matrix, Sb is the between-class scatter matrix and Sm is mixture scatter matrix
  • 14. 1. Within-class scatter matrix (Sw ) Average feature variance per class 2. Between-class scatter matrix (Sb) Average variance of class means vs. global mean 3. Mixture scatter matrix (Sm) Feature covariance with respect to global mean Large values of J1, J2, and J3 indicate that data points in the respective feature space have small within-class variance and large between-class distance.
  • 15. Example 4.7.3 In this example, the previously defined J3 measure will be used to choose the best l features out of m > l originally generated features. To be more realistic, we will consider Example 4.6.2, where four features (mean, standard deviation, skewness, and kurtosis) were used. From Table 4.3 we have ten values for each feature and for each class. The goal is to select three out of the four features that will result in the best J3 value. 1. Normalize the values of each feature to have zero mean and unit variance. 2. Select three out of the four features that result in the best J3 value. 3. Plot the data points in the 3-dimensional space for the best feature combination.
  • 16. Matlab Code % 1. Load files “cirrhoticLiver.dat” and “fattyLiver.dat” class1=load('cirrhoticLiver.dat')'; class2=load('fattyLiver.dat')'; % Normalize features [class1,class2]=normalizeStd(class1,class2); % 2. Evaluate J3 for the three-feature combination [1, 2, 3], where 1 stands for the mean, 2 for std, and so on [J]=ScatterMatrices(class1([1 2 3],:),class2([1 2 3],:)) % 3. Plot the results of the feature combination [1, 2, 3]. Use function plotData featureNames = {'mean','standart dev','skewness','kurtosis'}; plotData(class1,class2,[1 2 3],featureNames);
  • 17. Result J [1 2 3] = 3.9742 Another Result J [1 2 4] = 3.9741 J [1 3 4] = 3.6195 J [2 3 4] = 1.3972 J [1 2 3] Indicate that data points in the respective feature space have small within- class variance and large between-class distance.
  • 18. Feature Subset Selection • Problem Select a subset of l features out of m originally available, with the goal of maximizing class separation. • Approaches: 1. Scalar Feature Selection : treat feature individually (ignores feature correlations) 2. Feature vector Selection: consider feature sets and feature correlation
  • 19. Scalar Feature Selection Procedure: 1. Select a class separability criterion C and compute its values for all the available features xk, k 1, 2, . . . ,m. Rank them in descending order and choose the one with the best C value. Let us say that this is xi1 . 2. To select the second feature, compute the cross-correlation coefficient between the chosen xi1 and each of the remaining m1 features.
  • 20. 3. Choose the feature xi2 for which where a1, a2 are weighting factors that determine the relative importance we give to the two terms 4. Select xik , k 3, . . . , l, so that That is, the average correlation with all previously selected features is taken into account
  • 21. Example 4.8.1 This example demonstrates the ranking of a given number of features according to their class-discriminatory power. We will use the same data as in Example 4.6.2. Recall that there are two classes and four features (mean, standard deviation, skewness, and kurtosis); their respective values for the two classes are summarized in Table 4.3. 1. Normalize the features so as to have zero mean and unit variance. Then use the FDR criterion to rank the features by considering each one independently (see Example 4.6.2). 2. Use the scalar feature selection technique described before to rank the features. Use the FDR in place of the criterion C. Set a1 = 0.2 and a2 = 0.8. 3. Comment on the results
  • 23. Result Scalar Feature Ranking 1. (mean) 2. (st. dev.) 3. (skewness) 4. (kurtosis) Scalar Feature Ranking with correlation 1. (mean) 2. (kurtosis) 3. (st. dev.) 4. (skewness) The rank order provided by the FDR class- separability criterion differs from the ranking that results when both the FDR and the cross- correlations are considered. It must be emphasized that different values for coefficients a1 and a2 may change the ranking.
  • 24. Feature Vector Selection Feature vector selection: consider feature sets and feature correlation. The goal now is to find the “best” combination of features. Techniques: • Exhaustive search • Sequential forward and backward selection • Forward floating search selection
  • 25. Exhaustive search According to this technique, all possible combinations will be “exhaustively” formed and for each combination its class separability will be computed. Different class separability measures will be used.
  • 26. Example 4.8.2 We will demonstrate the procedure for selecting the “best” feature combination in the context of computer-aided diagnosis in mammography. In digital X-ray breast imaging, micro calcifications are concentrations of small white spots representing calcium deposits in breast tissue (see Figure 4.6). Often they are assessed as a noncancerous sign, but depending on their shape and concentration patterns, they may indicate cancer. Their significance in breast cancer diagnosis tasks makes it important to have a pattern recognition system that detects the presence of micro calcifications in a mammogram. To this end, a procedure similar to the one used in Example 4.6.2 will be adopted here. Four features are derived for each ROI: mean, standard deviation, skewness, and kurtosis. Figure 4.6 shows the selected ROIs for the two classes (10 for normal tissue; 11 for abnormal tissue with micro calcifications).
  • 27. 1. Employ the exhaustive search method to select the best combination of three features out of the four previously mentioned, according to the divergence, the Bhattacharyya distance, and the J3 measure associated with the scatter matrices. 2. Plot the data in the subspace spanned by the features selected in step 1.
  • 28. Matlab Code % 1. Load files breastMicrocalcifications.dat and breastNormalTissue.dat class1=load('breastMicrocalcifications.dat')'; class2=load('breastNormalTissue.dat')'; % Normalize features so that all of them have zero mean and unit variance [class1,class2]=normalizeStd(class1,class2); % Select the best combination of features using the J3 criterion costFunction='ScatterMatrices'; [cLbest,Jmax]=exhaustiveSearch(class1,class2, costFunction,[3]) % 2. Form the classes c1 and c2 using the best feature combination c1 = class1(cLbest,:); c2 = class2(cLbest,:); % Plot the data using the plotData function featureNames = {'mean','st. dev.','skewness','kurtosis'}; plotData(c1,c2,cLbest,featureNames);
  • 29. Result cLbest = 1 2 3 Jmax = 1.9625 -3 -2 -1 0 1 2 -2 0 2 4 -2 -1 0 1 2 3 Feature: meanFeature: st. dev. Feature:skewness
  • 30. Suboptimal Searching Techniques Backward Selection 1. Start with all features in a vector (m features) 2. Interactively eliminate one feature, compute class separability criterion 3. Keep combination with the highest criterion value 4. Repeat with chosen combination until we have a vector of size l Number of Combinations Generated 1 + ½ ((m +1)m - l (l +1))
  • 31. Sequential Forward Selection 1. Compute criterion value for each feature 2. Select feature with best value 3. Form all possible pairings of best vector with another unused feature 4. Evaluate each using the criterion, select best vector 5. Repeat step 3 until we have a vector of size l Number of Combinations Generated lm- l(l- 1)/2
  • 32. Forward Floating Search Selection The idea of forward floating search selection: we to add a feature (forward), check smaller feature sets to see if we do better with this feature replacing a previously selected feature (backward). Terminate when k features selected.
  • 33. Example 4.8.4 The goal of this example is to demonstrate the (possible) differences in accuracy between the suboptimal feature selection techniques SFS, SBS, and SFFS and the optimal exhaustive search technique, which will be considered as the “gold standard.” We assume that we are working in a 2-dimensional feature space, and we evaluate the performance of the four feature selection methods in determining the best 2-feature combination, employing the functions exhaustive Search, Sequential Forward Selection, Sequential Backward Selection, and Sequential Forward Floating Selection.
  • 35. Result Exhaustive Search -> Best of two: (1) (6) Sequential Forward Selection -> Best of two: (1) (5) Sequential Backward Selection -> Best of two: (2) (9) Floating Search Method -> Best of two: (1) (6)
  • 36. Example 4.8.4. Designing a Classification System The goal of this example is to demonstrate the various stages involved in the design of a classification system. We will adhere to the 2-class texture classification task of Figure 4.8. The following five stages will be explicitly considered: • Data collection for training and testing • Feature generation • Feature selection • Classifier design • Performance evaluation
  • 37. Data collection for training and testing • A number of different images must be chosen for both the training and the test set. To form the training data set, 25 ROIs are selected from each image representing the two classes Feature generation • From each of the selected ROIs, a number of features is generated. We have decided on 20 different texture-related features.
  • 38. • Feature selection In this case, we use scalar feature selection (function composite Features Ranking), which employs the FDR criterion as well as a cross- correlation measure between pairs of features in order to rank them. • Classifier design In this stage different classifiers are employed in the selected feature space; the one that results in the best performance is chosen. In this example we use the k-nearest neighbour (k-NN) classifier.
  • 39. • Performance evaluation The performance of the classifier, in terms of its error rate, is measured against the test data set. However, in order to cover the case where the same data must be used for both training and testing, the leave-one-out (LOO) method will be employed. LOO is particularly useful in cases where only a limited data set is available.
  • 40. Matlab Code % 1. Read the training data and normalize the feature values. c1_train=load('trainingClass1.dat')'; c2_train=load('trainingClass2.dat')'; % Normalize dataset superClass=[c1_train c2_train]; for i=1:size(superClass,1) m(i)=mean(superClass(i,:)); % mean value of feature s(i)=std (superClass(i,:)); % std of feature superClass(i,:)=(superClass(i,:)-m(i))/s(i); end c1_train=superClass(:,1:size(c1_train,2)); c2_train=superClass(:,size(c1_train,2)+1:size(superClass,2)); % 2. Rank the features using the normalized training dataset. We have adopted % the scalar feature ranking technique which employs Fisher’s Discrimination Ratio % in conjunction with a feature correlation measure. The ranking results are returned in variable p. [T]=ScalarFeatureSelectionRanking(c1_train,c2_train,'Fisher'); [p]= compositeFeaturesRanking (c1_train,c2_train,0.2,0.8,T);
  • 41. % 4. Choose the best feature combination consisting of three features (out of the previously selected seven), using the exhaustive search method. • [cLbest,Jmax]= exhaustiveSearch(c1_train,c2_train,'ScatterMatrices',[3]); % 5. Form the resulting training dataset (using the best feature combination), along with the corresponding class labels. trainSet=[c1_train c2_train]; trainSet=trainSet(cLbest,:); trainLabels=[ones(1,size(c1_train,2)) 2*ones(1,size(c2_train,2))]; % 6. Load the test dataset and normalize using the mean and std (computed over the training dataset) Form the vector of the corresponding test labels. c1_test=load('testClass1.dat')'; c2_test=load('testClass2.dat')'; for i=1:size(c1_test,1) c1_test(i,:)=(c1_test(i,:)-m(i))/s(i); c2_test(i,:)=(c2_test(i,:)-m(i))/s(i); end c1_test=c1_test(inds,:); c2_test=c2_test(inds,:); testSet=[c1_test c2_test]; testSet=testSet(cLbest,:); testLabels=[ones(1,size(c1_test,2)) 2*ones(1,size(c2_test,2))];
  • 42. % 7. Plot the test dataset by means of function plotData Provide names for the features featureNames={'mean','stand dev','skewness','kurtosis','Contrast 0','Contrast 90','Contrast 45','Contrast 135','Correlation 0','Correlation 90','Correlation 45','Correlation 135','Energy 0','Energy 90','Energy 45','Energy 135','Homogeneity 0','Homogeneity 90','Homogeneity 45','Homogeneity 135'}; fNames=featureNames(inds); fNames=fNames(cLbest); plotData(c1_test(cLbest,:),c2_test(cLbest,:),1:3,fNames); % 8. Classify the feature vectors of the test data using the k-NN classifier [classified]=k_nn_classifier(trainSet,trainLabels,3,testSet); [classif_error]=compute_error(testLabels,classified) % Leave-one-out (LOO) method (a) Load all training data, normalize them and create class labels. c1_train=load('trainingClass1.dat')'; c1=c1_train; Labels1=ones(1,size(c1,2)); c2_train=load('trainingClass2.dat')'; c2=c2_train; Labels2=2*ones(1,size(c2,2)); [c1,c2]=normalizeStd(c1,c2); AllDataset=[c1 c2]; Labels=[Labels1 Labels2];
  • 43. % (b) Keep features of the best feature combination (determined previously) and discard all the rest. AllDataset=AllDataset(inds,:); AllDataset=AllDataset(cLbest,:); % (c) Apply the Leave One Out method on the k-NN classifier (k = 3) and compute the error. [M,N]=size(AllDataset); for i=1:N dec(i)=k_nn_classifier([AllDataset(:,1:i-1) AllDataset(:,i+1:N)], [Labels(1,1:i-1) Labels(1,i+1:N)],3,AllDataset(:,i)); end LOO_error=sum((dec~=Labels))/N
  • 44. Result Classification Error: 0.0200 The leave-one-out (LOO) error is 1%.
  • 45. Conclusion Feature Selection: select the most effective features according to given optimal criterion (ranking the features according to their contribution to recognition results)
  • 46. Reference • Sergios Theodoridis, Konstantinos Koutroumbas: An Introduction to Pattern Recognition : A MATLAB Approach