O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Classification of
chestnuts with
feature selection by
noise resilient
classifiers
Elena Roglia
Rossella Cancelliere
Rosa M...
Classifying chestnut plants
according to their place of origin
Which features ?
Which classifiers ?
What to do with noise?
Prediction of chestnut origin from
their properties is important!!
Industrial
applications
For example:
verification of
ce...
Why feature selection?
Botanic features are
extracted collected and
stored in a data set by
human agents.
The process is:
...
Its nodes perform a
test on a data attribute:
the outcome partitions
the training set into
smaller partitions until
the cl...
C4.5 algorithm induces the form of the decision tree
using entropy of the class value.
It grows the tree until the entropy...
Random forest
Large number of decision
trees is grown.
Each tree depends on the
values of a random vector .
Predictions ar...
Which classifiers?
Initial data set
1600 Samples
19 Features from fruit
peculiarities
MLP RBF C4.5 RF SMO
58.12% 47.97% 49...
New data set
1600 Samples
37 Features from fruit and
plant peculiarities
Symmetrical Uncertainty,
Chi-Square Statistic,
Ga...
Some details……..
Random forest  forest of trees built with
all 6 features and trained each on a
different training set bu...
ACCURACIES ON TEST SET
MLP 97.91%
C4.5 100%
RF 100%
i(A)η0.05i(A)(A)i' 
11 
WHICH ARE THE
PERFORMANCE
IN NOISY DAT...
ACCURACIES ON TEST SET
WITHOUT
NOISE
WITH
NOISE
MLP 97.91% 93.12%
C4.5 100% 82.29%
RF 100% 90.62%
+ DECISION TREE
∆ RANDOM...
The results, in the context of this peculiar
domain, confirm the robustness of neural
network classification techniques an...
Próximos SlideShares
Carregando em…5
×

Classification of chestnuts with feature selection by noise resilient classifiers

340 visualizações

Publicada em

European Symposium on Artificial Neural Networks (ESANN 2008)

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Classification of chestnuts with feature selection by noise resilient classifiers

  1. 1. Classification of chestnuts with feature selection by noise resilient classifiers Elena Roglia Rossella Cancelliere Rosa Meo University of Turin – Department of Computer Science – Italy
  2. 2. Classifying chestnut plants according to their place of origin Which features ? Which classifiers ? What to do with noise?
  3. 3. Prediction of chestnut origin from their properties is important!! Industrial applications For example: verification of certificates of product origin. Papaya from Italy?!! Think before eat !!
  4. 4. Why feature selection? Botanic features are extracted collected and stored in a data set by human agents. The process is: - lengthy - costly - error prone
  5. 5. Its nodes perform a test on a data attribute: the outcome partitions the training set into smaller partitions until the class value becomes homogeneous . The class value is the prediction of the decision tree for the set of data records that reach that final node. Decision tree
  6. 6. C4.5 algorithm induces the form of the decision tree using entropy of the class value. It grows the tree until the entropy reduction falls under a user-defined threshold. Entropy reduction Tree induction by entropy
  7. 7. Random forest Large number of decision trees is grown. Each tree depends on the values of a random vector . Predictions are usually combined using the technique of majority voting so that the most popular class among them is predicted. is a special ensemble learner
  8. 8. Which classifiers? Initial data set 1600 Samples 19 Features from fruit peculiarities MLP RBF C4.5 RF SMO 58.12% 47.97% 49.81% 55.06% 52.50% NEW FEATURES ?
  9. 9. New data set 1600 Samples 37 Features from fruit and plant peculiarities Symmetrical Uncertainty, Chi-Square Statistic, Gain Ratio, Information Gain, oneR methodology……. FINAL DATA SET 1600 Samples 6 Features selected by entropy-based information gain criterion # of chestnuts/kg Diameter of the trunk # female inflorescence/ament Ament length Length of the leaf limb, Height of the plant.
  10. 10. Some details…….. Random forest  forest of trees built with all 6 features and trained each on a different training set built with random selection of samples with replacement. Training set (70%) 1120 Test set (30%) 480 Target class 8 MLP  6,12 and 8 neurons (one for each geographic zone) respectively in input, hidden and output layers. Training phase of 100 iterations. Decision tree  Used default settings of Weka tool. Obtained a binary tree of 15 leaves and 6 levels.
  11. 11. ACCURACIES ON TEST SET MLP 97.91% C4.5 100% RF 100% i(A)η0.05i(A)(A)i'  11  WHICH ARE THE PERFORMANCE IN NOISY DATA SET? i(A) is the value of attribute A in the i-th instance and ARE THESE CLASSIFIERS ROBUST? 11  )A(i05.0)A(i)A('i 
  12. 12. ACCURACIES ON TEST SET WITHOUT NOISE WITH NOISE MLP 97.91% 93.12% C4.5 100% 82.29% RF 100% 90.62% + DECISION TREE ∆ RANDOM FOREST * MULTILAYER PERCEPTRON QUITE STABLE!! CLASS PREDICTION MARGINALLY AFFECTED BY NOISE MORE SENSITIVE!!
  13. 13. The results, in the context of this peculiar domain, confirm the robustness of neural network classification techniques and their reliability for treating noisy data. Even though decision trees and random forests reach higher accuracy rates on clean and safe test data, when noise is present, they result less robust and stable. FURTHERS WORKS COMING SOON!!!! Conclusions

×