SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
A Comparative Study of Machine Learning Techniques for Caries Prediction
Robson D. Montenegro, Adriano L. I. Oliveira, George G. Cabral
Department of Computing and Systems, Polytechnic School, Pernambuco State University
Rua Benfica, 455, Madalena, Recife PE, Brazil, 50.750-410
{adriano,rdm,ggc}@dsc.upe.br
Cintia R. T. Katz, Aronita Rosenblatt
Department of Preventive and Social Dentistry, Faculty of Dentistry, Pernambuco State University
Av. Gal. Newton Cavalcanti, 1.650 - Camaragibe, PE, Brazil, 54.753-220
cintiakatz@uol.com.br, rosen@reitoria.upe.br
Abstract
There are striking disparities in the prevalence of den-
tal disease by income. Poor children suffer twice as much
dental caries as their more affluent peers, but are less likely
to receive treatment. This paper presents an experimental
study of the application of machine learning methods to the
problem of caries prediction. For this paper a data set col-
lected from interviews with children under five years of age,
in 2006, in Recife, the capital of Pernambuco, a state in
northeast Brazil, was built. Four different data mining tech-
niques were applied to this problem and their results were
confronted in terms of the classification error and area un-
der the ROC curve (AUC). Results showed that the MLP
neural network classifier outperformed the other machine
learning methods employed in the experiments, followed by
the support vector machine (SVM) predictor. In addition,
the results also show that some rules (extracted by decision
tress) may be useful for understanding the most important
factors that influence the occurrence of caries in children.
1 Introduction
The early childhood caries is a disease that occurs in
young kids and is associated with malnutrition and inad-
equate eating habits during weaning. Dental caries is the
single most common chronic childhood disease - 5 times
more common than asthma and 7 times more common than
hay fever. This disease is considered a public health prob-
lem due to its impact in quality of life; it affects, almost
exclusively, children of social-economic groups less privi-
leged in developed and developing countries. Preceded by
enamel defects, the early childhood caries may have limited
its progress if detected early [22][21].
The increasing widespread use of information systems
in health and the considerable growth of data bases require
traditional manual data analyses to be adjusted to new, ef-
ficient computational models [13], those manual processes
easily break down while the size of the data grows and the
number of dimensions increases. Data Mining is a research
method that has been used to provide benefits to a large
number of fields of medicine, including diagnosis, progno-
sis and the treatment of diseases [2][3][17]. It encompasses
techniques such as machine learning and artificial neural
networks (ANNs), which have been successfully applied to
medical problems to predict clinical results [2][17].
In recent years, there has been a significant increase in
the use of technology in medicine and related areas. The
complexity and sophistication of the technologies often re-
quire the solution of decision problems using combinatorics
and optimization methods [3]. Despite the importante of
data mining and machine learning techniques, there remains
little application of these techniques to the field of den-
tistry. Recently, Oliveira et al. applied machine learning
techniques in the field of dentistry [6][18]. These works
aimed to predict the success of a dental implant by means
of machine learning techniques [6][18].
The purpose of this paper is to build robust models to
solve the problem of prediction of the presence of caries in
preschool children with ages less than five years in state
schools (attended by the low-income population) in Re-
cife, the capital of Pernambuco in the northeastern region
of Brazil. This paper also aims to extract and display, in
more friendly form, the rules, or factors, associated to the
caries prediction, in this specific case.
2 Data Set Characteristics
A databank was constructed with information collected
from 3864 Brazilian preschool children with ages less than
five years. A cross-sectional study was conducted in state
schools (attended by the low-income population) in Re-
cife, the capital of Pernambuco in the northeastern region
of Brazil.
Recife is one of the three most important urban centers
of the northeastern region of Brazil. The population of the
city and its surrounding area is over 3 million people. The
city is divided into six administrative regions and has 153
schools run by the municipality, to which 4,787 4-year-old
children attend.
The questionnaires were completed during personal in-
terviews with each child’s mother. In every case, the ex-
aminer was blind to the child’s questionnaire data. Exam-
inations were performed under natural light, in the class-
room environment, using tongue blades, gloves and masks,
in compliance with the infection control protocol (Ministry
of Health, Brazil).
For each child, 193 (one hundred and ninety and three)
features were collected in the questionnaire. From this to-
tal, only sixteen features were considered significant to the
problem of caries prediction.
As shown in table 1, there is a significantly greater occur-
rence of healthy samples, thereby making the data set un-
balanced [14]. For this reason, in the experiments, only 998
samples were considered for the caries prediction. These
998 samples are equally divided in caries and healthy sam-
ples.
Table 1. Distribution of caries in the whole
dataset.
Class number of samples
Caries 499
Healthy 3365
Total 3864
The input variables (attributes) considered in our prob-
lem are:
1. Gender: male/female.
2. Age in months.
3. Parent’s opinion about the oral health of the child (ex-
cellent, good, regular, bad, very bad)
4. Has the child already had a toothache ? (yes/no)
5. Family income (1 to 7, or more) in minimum wages
(yes/no)
6. Child has already gone to the dentist and a caries was
diagnosed (yes/no)
7. Child has never gone to the dentist for another reason
(yes/no)
8. Child has already gone to the dentist (yes/no)
9. Child has already visited the dentist for having a
toothache (yes/no)
10. Presence of failure in the enamel (yes/no)
11. Presence of fistula (yes/no)
12. Political-administrative region (from 1 to 6)
13. Child has never gone to the dentist for access reason
(yes/no)
14. Child has already gone to the dentist for prevention
reason (yes/no)
15. Child has never gone to the dentist for financial ques-
tions (yes/no)
The output variable is:
1. Presence of caries (yes/no)
3 The Classifiers Evaluated
In this section we briefly review the four classification
techniques used in this work, namely, (1) decision trees, (2)
MLP neural networks, (3) kNN, and (4) support vector ma-
chines.
Decision Trees are statistical models for classification
and data prediction. These models take a ”divide-and-
conquer” approach: a complex problem is decomposed in
simpler sub-models and, recursively, this technique is ap-
plied to each sub-problem [10].
For this work we have chosen one of the most popular
algorithms for building decision trees, the C4.5 [20]. C4.5 is
a software extension of the basic ID3 algorithm designed by
Quinlan to address some issues not dealt with by ID3, such
as avoiding over fitting the data, determining how deeply to
grow a decision tree, improving computational efficiency,
etc. Quinlan’s C4.5 has a factor named confidence factor,
denoted by C, that is used for pruning. In general, smaller
values of C yields more pruning. For the experiments we
have varied the value of the confidence factor to obtain a
more accurate model of classification.
The MLP neural network (Multi Layer Perceptron) de-
rives from the Perceptron model of neural networks. Unlike
the basic perceptron, MLPs are able to to solve non-linearly
separable problems. For this work we have chosen the back-
propagation learning algorithm for training MLP neural net-
works.
The MLP network is trained by adapting the weights.
During training the network output is compared with a de-
sired output. The error, that is, the difference between these
two signals is used to adapt the weights. The rate of adapta-
tion is controlled by the learning rate. A high learning rate
will make the network adapt its weights quickly, but will
make it potentially unstable. Therefore it is recommended
to use small learning rates in practical applications.
kNN is a classical prototype-based (or memory-based)
classifier, which is often used in real-world applications due
to its simplicity [24]. Despite its simplicity, it has achieved
considerable classification accuracy on a number of tasks
and is therefore quite often used as a basis for comparison
with novel classifiers.
Support vector machine (SVM) is a recent technique for
classification and regression which has achieved remarkable
accuracy in a number of important problems [4], [23], [5],
[1]. SVM is based on the principle of structural risk mini-
mization (SRM), which states that, in order to achieve good
generalization performance, a machine learning algorithm
should attempt to minimize the structural risk instead of
the empirical risk [9], [1]. The empirical risk is the error in
the training set, whereas the structural risk considers both
the error in the training set and the complexity of the class
of functions used to fit the data. Despite its popularity in
the machine learning and pattern recognition communities,
a recent study has shown that simpler methods, such as kNN
and neural networks, can achieve performance comparable
to or even better than SVMs in some classification and re-
gression problems [16].
4 Experiments
The simulations were carried out using the Weka data
mining tool, which includes several pre-processing and
classification methods [25].
We have used 10-fold cross-validation to assess the gen-
eralization performance as well as to compare the classi-
fiers considered in this article. In 10-fold cross-validation
(CV), a given dataset is divided into ten subsets. A classi-
fier is trained using a subset formed by joining nine of these
subsets and tested by using the one left aside [7]. This is
done ten times each employing a different subset as the test
set and computing the test set error, Ei. Finally, the cross-
validation error is computed as the mean over the ten errors
Ei, 1 < i < 10. It is important to emphasize that all the
simulations reported here used stratified CV, whereby the
subsets are formed by using the same frequency distribu-
tion of patterns of the original [25].
The performance measures used to compare the classi-
fiers are (1) the classification error, and (2) the area under
the ROC curve (AUC) [9], [11], [12]. ROC curves origi-
nated from signal detection theory and are more frequently
used in the case of one-class classification or classification
with two classes, which is the case of our problem [18][8].
In the ROC curve, the x-axis represents the PFA (Prob-
ability of False Alarm), which identifies normal patterns
wrongly classified as novelties; the y-axis represents the PD
(Probability of Detection), which identifies the likelihood
of patterns of the novelty class being recognized correctly.
The area under the ROC curve (AUC) summarizes the ROC
curve and is another way to compare classifiers other than
the accuracy, according to Huang and Ling [18]. In com-
parison with others classifiers, the best classifier is the one
that obtains an AUC more close to 1.
Aiming to select the attributes from the dataset with
greater significance to the problem we have used In-
foGainAttributeEval, as the attribute evaluator, with the
search method Ranker. The InfoGainAttributeEval evalu-
ates the worth of an attribute by measuring the information
gain with respect to the class. The Ranker ranks attributes
by their individual evaluations using a threshold by which
attributes can be discarded. For our experiments we varied
the thresholds by which attributes can be discarded from
10−4
to 10−1
.
4.1 Results and Discussion
We carried out experiments aiming to analyze the per-
formance for the different selected attributes (see table ??).
Table 2 shows the results obtained using the whole input
feature vector (15 input variables), that is, without feature
selection. In these experiment we have achieved a better re-
sult with the MLP method, followed by the SVM (in terms
of 10-fold cross-validation error). In terms of AUC, MLP
have achieved better results, followed by kNN.
For the decision trees, the results demonstrate that the
parameter C has a great influence on the performance of
the classifier, whereas the error has increased 5.01% from
the C = 0.25 to C = 0.001. For C = 0.25, the decision tree
has created 78 nodes while the other decision tree using C
= 0.001 has created only 5 nodes. Fig. 1 shows the simple
model created by the C4.5 algorithm for C = 0.001 without
feature selection. These results match with AUC results, for
C = 0.25 AUC is better than the AUC for C = 0.001.
Among all the experiments carried out using feature se-
lection the best results were found by the InfoGainAttribu-
teEval threshold = 10−4
, which means using only two input
attributes. The two attributes selected by InfoGainAttribu-
teEval were age in months and opinion of the responsible
about the oral health of the child.
Table 3 shows the results obtained using the InfoGainAt-
Table 2. Caries prediction results without feature selection (15 input attributes)
Classifier 10-fold cross-validation error AUC
kNN(k = 19) 26.75% 0.8178
C4.5 (C = 0.25) 25.95% 0.7985
C4.5 (C = 0.001) 30.96% 0.7193
MLP (hidden layer units = 2, learning rate = 0.01, epochs = 500) 22.75% 0.8452
SVM (C = 1, σ = 0.1) 23.65% 0.7635
Figure 1. Decision Tree for C = 0.001.
tributeEval threshold = 10−4
. With only two attributes we
have improved the results obtained by kNN and decision
trees. Conversely, the results of the MLP and SVM meth-
ods were inferior to those with 15 input variables. In these
experiments we, as in the experiments without feature se-
lection, have achieved a better result with the MLP method,
followed by the kNN in terms of both performance criteria,
namely, the classification error and the AUC value.
Using feature selection the performance of both decision
trees models achieved a discrete performance improvement.
As a multidisciplinary work, this paper have chosen deci-
sion trees as one of the methods to treat this problem by its
ability to rules extraction of the problem. For a dentist it is
easier to use the results provided by decision trees than to
use the results of classifiers such as MLPs, which are harder
to interpret.
5 Conclusion
The early childhood caries is considered a public health
problem which occurs often in children of social-economic
groups less privileged. In this work we have compared the
performance of four different classifiers applied to the prob-
lem of caries prediction. For this problem, we also per-
formed a feature selection in the dataset aiming to retrieve
the attributes more relevant to the task of caries prediction.
The results have shown that the best model for caries
prediction was obtained by MLP Neural Networks, which
achieved a 10-fold cross validation error rate of 22.75%,
without feature selection. Using the InfoGainAttributeEval
as feature selection method, the MLP and SVM methods
had a discrete performance loss whereas the decision trees
(C = 0.001 and C = 0.25) and the kNN achieved a discrete
improvement in their performance.
From the results obtained in this work we can see that
children with ages from twenty three months are more
caries prone. The results also show that the family income,
if the child had already a toothache and if the child had al-
ready a caries diagnoses, influences the occurrence of the
disease. The results also show that children already diag-
nosed as caries carrier has presented recurrence; this makes
us conclude that the treatment is not achieving a needed ef-
ficiency in the reeducation of the child’s oral hygiene.
References
[1] V. D. S. A. Advanced support vector machines and kernel
methods. Neurocomputing, 55(1-2):5–20, 2003.
[2] S. R. Bhatikar, C. DeGroff, and R. L. Mahajan. A classi-
fier based on the artificial neural network approach for car-
diologic auscultation in pediatrics. Artificial Intelligence in
Medicine, 33(3):251–260, 2005.
[3] T.-C. Chen and T.-C. Hsu. A GAs based approach for min-
ing breast cancer pattern. Expert Syst. Appl, 30(4):674–681,
2006.
[4] C. Cortes and V. Vapnik. Support vector networks. Machine
Learning, 20:1–25, 1995.
[5] N. Cristianini and J. Shawe-Taylor. An Introduction to Sup-
port Vector Machines. Cambridge University Press, 2000.
[6] A. L. I. de Oliveira, C. Baldisserotto, and J. Baldisserotto. A
comparative study on machine learning techniques for pre-
diction of success of dental implants. In A. F. Gelbukh,
A. de Albornoz, and H. Terashima-Mar´ın, editors, MICAI,
volume 3789 of Lecture Notes in Computer Science, pages
939–948. Springer, 2005.
[7] D. Delen, G. Walker, and A. Kadam. Predicting breast can-
cer survivability: a comparison of three data mining meth-
ods. Artificial Intelligence in Medicine, 34(2):113–127,
2005.
[8] N. M. Farsi and F. S. Salama. Sucking habits in saudi chil-
dren: prevalence, contributing factors and effects on the pri-
mary dentition. Pediatr Dent, 19(1):28–33, 1997.
[9] T. Fawcett. An introduction to ROC analysis. Pattern Recog-
nition Letters, 27(8):861–874, June 2006.
[10] J. Gama. Functional trees. Machine Learning, 55(3):219–
250, 2004.
Table 3. Caries prediction results for InfoGainAttributeEval threshold = 10−4
(2 input attributes).
Classifier 10-fold cross-validation error AUC
kNN(k = 11) 24,65% 0.8136
C4.5 (C = 0.25) 25,15% 0.8011
C4.5 (C = 0.001) 29,76% 0.7458
MLP (hidden layer units = 2, learning rate = 0.01, epochs = 500) 24,75% 0.8223
SVM (C = 100, σ = 0.1) 25,05% 0.7495
Figure 2. Decision Tree for C = 0.25 with feature selection and InfoGainAttributeEval threshold = 10−4
.
[11] J. Huang and C. X. Ling. Using AUC and accuracy in eval-
uating learning algorithms. IEEE Trans. Knowl. Data Eng,
17(3):299–310, 2005.
[12] T. A. Lasko, J. G. Bhagwat, K. H. Zou, and L. Ohno-
Machado. The use of receiver operating characteristic
curves in biomedical informatics. Journal of Biomedical In-
formatics, 38(5):404–415, 2005.
[13] N. Lavraˇc. Machine learning for data mining in medicine. In
W. Horn, Y. Shahar, G. Lindberg, S. Andreassen, and J. Wy-
att, editors, Proceedings of the Joint European Conference
on Artificial Intellingence in Medicine and Medical Decision
Making (AIMDM-99), volume 1620 of LNAI, pages 47–64,
Berlin, June 20–24 1999. Springer.
[14] Y. Lu, H. Guo, and L. Feldkamp. Robust neural learning
from unbalanced data samples. In IEEE International Con-
ference on Neural Networks (IJCNN’98), volume III, pages
III–1816–III–1821, Anchorage, AK, July 1998. IEEE.
[15] W. P. W. S. McCulloch. A logical calculus of ideas im-
manent in nervous activity. Bulletin of Mathematical Bio-
physics, 5:115–133, 1943.
[16] D. Meyer, F. Leisch, and K. Hornik. The support vector ma-
chine under test. Neurocomputing, 55(1-2):169–186, 2003.
[17] B. A. Mobley, E. Schechter, W. E. Moore, P. A. McKee,
and J. E. Eichner. Neural network predictions of significant
coronary artery stenosis in men. Artificial Intelligence in
Medicine, 34(2):151–161, 2005.
[18] A. L. I. Oliveira, C. Baldisserotto, and J. Baldisserotto. A
comparative study on support vector machine and construc-
tive RBF neural network for prediction of success of den-
tal implants. In A. Sanfeliu and M. Lazo-Cort´es, editors,
CIARP, volume 3773 of Lecture Notes in Computer Science,
pages 1015–1026. Springer, 2005.
[19] J. R. Quinlan. Induction of decision trees. In J. W. Shavlik
and T. G. Dietterich, editors, Readings in Machine Learning.
Morgan Kaufmann, 1990. Originally published in Machine
Learning 1:81–106, 1986.
[20] J. R. Quinlan. C4.5: Programs for Machine Learning. Mor-
gan Kaufmann, San Mateo, CA., 1993.
[21] S. Reisine and J. Douglass. Jm. psychosocial and behavioral
issues in early childhood caries, 1998.
[22] A. Rosenblatt and A. Zarzar. Breast feeding and early child-
hood caries: an assessment among brazilian infants. Inter-
national Journal of Paediatric Dentistry, 14:439–450, 2004.
[23] J. S. Taylor and N. Cristianini. Kernel Methods for Pattern
Analysis. Cambridge University Press, 2004.
[24] A. Webb. Statistical Pattern Recognition. Wiley, 2002.
[25] I. H. Witten and E. Frank. Data Mining: Practical Machine
Learning Tools and Techniques with Java Implementations.
Morgan Kaufmann, San Francisco, 2000.
[26] I. H. Witten and E. Frank. Data mining: practical machine
learning tools and techniques with Java implementations.
SIGMOD, 31(1):76–77, Mar. 2002.

Mais conteúdo relacionado

Mais procurados

Fuzzy Association Rule Mining based Model to Predict Students’ Performance
Fuzzy Association Rule Mining based Model to Predict Students’ Performance Fuzzy Association Rule Mining based Model to Predict Students’ Performance
Fuzzy Association Rule Mining based Model to Predict Students’ Performance IJECEIAES
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
 
Overall presentation Matram project
Overall presentation Matram project Overall presentation Matram project
Overall presentation Matram project RaphaelGirod
 
Natural language processing through the subtractive mountain clustering algor...
Natural language processing through the subtractive mountain clustering algor...Natural language processing through the subtractive mountain clustering algor...
Natural language processing through the subtractive mountain clustering algor...ijnlc
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET Journal
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET Journal
 
Assessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s RecitalAssessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s RecitalIRJET Journal
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Pistoia Alliance
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesCovance
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AIDatabricks
 

Mais procurados (16)

Fuzzy Association Rule Mining based Model to Predict Students’ Performance
Fuzzy Association Rule Mining based Model to Predict Students’ Performance Fuzzy Association Rule Mining based Model to Predict Students’ Performance
Fuzzy Association Rule Mining based Model to Predict Students’ Performance
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...
 
Overall presentation Matram project
Overall presentation Matram project Overall presentation Matram project
Overall presentation Matram project
 
research publication
research publicationresearch publication
research publication
 
Natural language processing through the subtractive mountain clustering algor...
Natural language processing through the subtractive mountain clustering algor...Natural language processing through the subtractive mountain clustering algor...
Natural language processing through the subtractive mountain clustering algor...
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career Prediction
 
IJET-V4I2P98
IJET-V4I2P98IJET-V4I2P98
IJET-V4I2P98
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
 
Assessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s RecitalAssessment of Decision Tree Algorithms on Student’s Recital
Assessment of Decision Tree Algorithms on Student’s Recital
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...
 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health Technologies
 
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 

Destaque

Nationally Recognised Courses Offered By Gold Training
Nationally Recognised Courses Offered By Gold TrainingNationally Recognised Courses Offered By Gold Training
Nationally Recognised Courses Offered By Gold Traininggoldrto
 
Working capital management and profitability an empirical analysis
Working capital management and profitability an empirical analysisWorking capital management and profitability an empirical analysis
Working capital management and profitability an empirical analysisIAEME Publication
 
New technologies-in-education
New technologies-in-education New technologies-in-education
New technologies-in-education Buket çam
 
Datastore company profile
Datastore company profileDatastore company profile
Datastore company profilegopartheredbuff
 
Diagram previous knowlege session
Diagram previous knowlege sessionDiagram previous knowlege session
Diagram previous knowlege sessionaula281
 
A travel time model for estimating the water budget of complex catchments
A travel time model for estimating the water budget of complex catchmentsA travel time model for estimating the water budget of complex catchments
A travel time model for estimating the water budget of complex catchmentsRiccardo Rigon
 

Destaque (8)

Scaffolding
ScaffoldingScaffolding
Scaffolding
 
Nationally Recognised Courses Offered By Gold Training
Nationally Recognised Courses Offered By Gold TrainingNationally Recognised Courses Offered By Gold Training
Nationally Recognised Courses Offered By Gold Training
 
Working capital management and profitability an empirical analysis
Working capital management and profitability an empirical analysisWorking capital management and profitability an empirical analysis
Working capital management and profitability an empirical analysis
 
Detailed Role, Responsibility and results of Service Design profile
Detailed Role, Responsibility and results of Service Design profileDetailed Role, Responsibility and results of Service Design profile
Detailed Role, Responsibility and results of Service Design profile
 
New technologies-in-education
New technologies-in-education New technologies-in-education
New technologies-in-education
 
Datastore company profile
Datastore company profileDatastore company profile
Datastore company profile
 
Diagram previous knowlege session
Diagram previous knowlege sessionDiagram previous knowlege session
Diagram previous knowlege session
 
A travel time model for estimating the water budget of complex catchments
A travel time model for estimating the water budget of complex catchmentsA travel time model for estimating the water budget of complex catchments
A travel time model for estimating the water budget of complex catchments
 

Semelhante a Machine Learning Techniques for Caries Prediction in Children

Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
 
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...IRJET Journal
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Publishing House
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Journals
 
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...ijtsrd
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataAlex Papageorgiou
 
Supervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predicSupervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predichema latha
 
Cancer prognosis prediction using balanced stratified sampling
Cancer prognosis prediction using balanced stratified samplingCancer prognosis prediction using balanced stratified sampling
Cancer prognosis prediction using balanced stratified samplingijscai
 
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...Evaluation of Student's Perception in Using Electronic Dental Records at Riya...
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...Dr. Faris Al-Masaari
 
Walden University NURS 6050 Polic
 Walden University   NURS 6050 Polic Walden University   NURS 6050 Polic
Walden University NURS 6050 PolicMoseStaton39
 
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...IJECEIAES
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
 
Risk stratification and school readiness
Risk stratification and school readinessRisk stratification and school readiness
Risk stratification and school readinessPredictX
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...TELKOMNIKA JOURNAL
 
HLT 362 V GCU Quiz 11. When a researcher uses a random sam
HLT 362 V GCU Quiz 11. When a researcher uses a random samHLT 362 V GCU Quiz 11. When a researcher uses a random sam
HLT 362 V GCU Quiz 11. When a researcher uses a random samSusanaFurman449
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...sipij
 

Semelhante a Machine Learning Techniques for Caries Prediction in Children (20)

Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
 
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey data
 
Supervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predicSupervised deep learning_embeddings_for_the_predic
Supervised deep learning_embeddings_for_the_predic
 
Cancer prognosis prediction using balanced stratified sampling
Cancer prognosis prediction using balanced stratified samplingCancer prognosis prediction using balanced stratified sampling
Cancer prognosis prediction using balanced stratified sampling
 
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...Evaluation of Student's Perception in Using Electronic Dental Records at Riya...
Evaluation of Student's Perception in Using Electronic Dental Records at Riya...
 
Walden University NURS 6050 Polic
 Walden University   NURS 6050 Polic Walden University   NURS 6050 Polic
Walden University NURS 6050 Polic
 
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
 
Risk stratification and school readiness
Risk stratification and school readinessRisk stratification and school readiness
Risk stratification and school readiness
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...An approach of cervical cancer diagnosis using class weighting and oversampli...
An approach of cervical cancer diagnosis using class weighting and oversampli...
 
HLT 362 V GCU Quiz 11. When a researcher uses a random sam
HLT 362 V GCU Quiz 11. When a researcher uses a random samHLT 362 V GCU Quiz 11. When a researcher uses a random sam
HLT 362 V GCU Quiz 11. When a researcher uses a random sam
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...
FACIAL AGE ESTIMATION USING TRANSFER LEARNING AND BAYESIAN OPTIMIZATION BASED...
 

Machine Learning Techniques for Caries Prediction in Children

  • 1. A Comparative Study of Machine Learning Techniques for Caries Prediction Robson D. Montenegro, Adriano L. I. Oliveira, George G. Cabral Department of Computing and Systems, Polytechnic School, Pernambuco State University Rua Benfica, 455, Madalena, Recife PE, Brazil, 50.750-410 {adriano,rdm,ggc}@dsc.upe.br Cintia R. T. Katz, Aronita Rosenblatt Department of Preventive and Social Dentistry, Faculty of Dentistry, Pernambuco State University Av. Gal. Newton Cavalcanti, 1.650 - Camaragibe, PE, Brazil, 54.753-220 cintiakatz@uol.com.br, rosen@reitoria.upe.br Abstract There are striking disparities in the prevalence of den- tal disease by income. Poor children suffer twice as much dental caries as their more affluent peers, but are less likely to receive treatment. This paper presents an experimental study of the application of machine learning methods to the problem of caries prediction. For this paper a data set col- lected from interviews with children under five years of age, in 2006, in Recife, the capital of Pernambuco, a state in northeast Brazil, was built. Four different data mining tech- niques were applied to this problem and their results were confronted in terms of the classification error and area un- der the ROC curve (AUC). Results showed that the MLP neural network classifier outperformed the other machine learning methods employed in the experiments, followed by the support vector machine (SVM) predictor. In addition, the results also show that some rules (extracted by decision tress) may be useful for understanding the most important factors that influence the occurrence of caries in children. 1 Introduction The early childhood caries is a disease that occurs in young kids and is associated with malnutrition and inad- equate eating habits during weaning. Dental caries is the single most common chronic childhood disease - 5 times more common than asthma and 7 times more common than hay fever. This disease is considered a public health prob- lem due to its impact in quality of life; it affects, almost exclusively, children of social-economic groups less privi- leged in developed and developing countries. Preceded by enamel defects, the early childhood caries may have limited its progress if detected early [22][21]. The increasing widespread use of information systems in health and the considerable growth of data bases require traditional manual data analyses to be adjusted to new, ef- ficient computational models [13], those manual processes easily break down while the size of the data grows and the number of dimensions increases. Data Mining is a research method that has been used to provide benefits to a large number of fields of medicine, including diagnosis, progno- sis and the treatment of diseases [2][3][17]. It encompasses techniques such as machine learning and artificial neural networks (ANNs), which have been successfully applied to medical problems to predict clinical results [2][17]. In recent years, there has been a significant increase in the use of technology in medicine and related areas. The complexity and sophistication of the technologies often re- quire the solution of decision problems using combinatorics and optimization methods [3]. Despite the importante of data mining and machine learning techniques, there remains little application of these techniques to the field of den- tistry. Recently, Oliveira et al. applied machine learning techniques in the field of dentistry [6][18]. These works aimed to predict the success of a dental implant by means of machine learning techniques [6][18]. The purpose of this paper is to build robust models to solve the problem of prediction of the presence of caries in preschool children with ages less than five years in state schools (attended by the low-income population) in Re- cife, the capital of Pernambuco in the northeastern region of Brazil. This paper also aims to extract and display, in more friendly form, the rules, or factors, associated to the caries prediction, in this specific case.
  • 2. 2 Data Set Characteristics A databank was constructed with information collected from 3864 Brazilian preschool children with ages less than five years. A cross-sectional study was conducted in state schools (attended by the low-income population) in Re- cife, the capital of Pernambuco in the northeastern region of Brazil. Recife is one of the three most important urban centers of the northeastern region of Brazil. The population of the city and its surrounding area is over 3 million people. The city is divided into six administrative regions and has 153 schools run by the municipality, to which 4,787 4-year-old children attend. The questionnaires were completed during personal in- terviews with each child’s mother. In every case, the ex- aminer was blind to the child’s questionnaire data. Exam- inations were performed under natural light, in the class- room environment, using tongue blades, gloves and masks, in compliance with the infection control protocol (Ministry of Health, Brazil). For each child, 193 (one hundred and ninety and three) features were collected in the questionnaire. From this to- tal, only sixteen features were considered significant to the problem of caries prediction. As shown in table 1, there is a significantly greater occur- rence of healthy samples, thereby making the data set un- balanced [14]. For this reason, in the experiments, only 998 samples were considered for the caries prediction. These 998 samples are equally divided in caries and healthy sam- ples. Table 1. Distribution of caries in the whole dataset. Class number of samples Caries 499 Healthy 3365 Total 3864 The input variables (attributes) considered in our prob- lem are: 1. Gender: male/female. 2. Age in months. 3. Parent’s opinion about the oral health of the child (ex- cellent, good, regular, bad, very bad) 4. Has the child already had a toothache ? (yes/no) 5. Family income (1 to 7, or more) in minimum wages (yes/no) 6. Child has already gone to the dentist and a caries was diagnosed (yes/no) 7. Child has never gone to the dentist for another reason (yes/no) 8. Child has already gone to the dentist (yes/no) 9. Child has already visited the dentist for having a toothache (yes/no) 10. Presence of failure in the enamel (yes/no) 11. Presence of fistula (yes/no) 12. Political-administrative region (from 1 to 6) 13. Child has never gone to the dentist for access reason (yes/no) 14. Child has already gone to the dentist for prevention reason (yes/no) 15. Child has never gone to the dentist for financial ques- tions (yes/no) The output variable is: 1. Presence of caries (yes/no) 3 The Classifiers Evaluated In this section we briefly review the four classification techniques used in this work, namely, (1) decision trees, (2) MLP neural networks, (3) kNN, and (4) support vector ma- chines. Decision Trees are statistical models for classification and data prediction. These models take a ”divide-and- conquer” approach: a complex problem is decomposed in simpler sub-models and, recursively, this technique is ap- plied to each sub-problem [10]. For this work we have chosen one of the most popular algorithms for building decision trees, the C4.5 [20]. C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan to address some issues not dealt with by ID3, such as avoiding over fitting the data, determining how deeply to grow a decision tree, improving computational efficiency, etc. Quinlan’s C4.5 has a factor named confidence factor, denoted by C, that is used for pruning. In general, smaller values of C yields more pruning. For the experiments we have varied the value of the confidence factor to obtain a more accurate model of classification. The MLP neural network (Multi Layer Perceptron) de- rives from the Perceptron model of neural networks. Unlike the basic perceptron, MLPs are able to to solve non-linearly
  • 3. separable problems. For this work we have chosen the back- propagation learning algorithm for training MLP neural net- works. The MLP network is trained by adapting the weights. During training the network output is compared with a de- sired output. The error, that is, the difference between these two signals is used to adapt the weights. The rate of adapta- tion is controlled by the learning rate. A high learning rate will make the network adapt its weights quickly, but will make it potentially unstable. Therefore it is recommended to use small learning rates in practical applications. kNN is a classical prototype-based (or memory-based) classifier, which is often used in real-world applications due to its simplicity [24]. Despite its simplicity, it has achieved considerable classification accuracy on a number of tasks and is therefore quite often used as a basis for comparison with novel classifiers. Support vector machine (SVM) is a recent technique for classification and regression which has achieved remarkable accuracy in a number of important problems [4], [23], [5], [1]. SVM is based on the principle of structural risk mini- mization (SRM), which states that, in order to achieve good generalization performance, a machine learning algorithm should attempt to minimize the structural risk instead of the empirical risk [9], [1]. The empirical risk is the error in the training set, whereas the structural risk considers both the error in the training set and the complexity of the class of functions used to fit the data. Despite its popularity in the machine learning and pattern recognition communities, a recent study has shown that simpler methods, such as kNN and neural networks, can achieve performance comparable to or even better than SVMs in some classification and re- gression problems [16]. 4 Experiments The simulations were carried out using the Weka data mining tool, which includes several pre-processing and classification methods [25]. We have used 10-fold cross-validation to assess the gen- eralization performance as well as to compare the classi- fiers considered in this article. In 10-fold cross-validation (CV), a given dataset is divided into ten subsets. A classi- fier is trained using a subset formed by joining nine of these subsets and tested by using the one left aside [7]. This is done ten times each employing a different subset as the test set and computing the test set error, Ei. Finally, the cross- validation error is computed as the mean over the ten errors Ei, 1 < i < 10. It is important to emphasize that all the simulations reported here used stratified CV, whereby the subsets are formed by using the same frequency distribu- tion of patterns of the original [25]. The performance measures used to compare the classi- fiers are (1) the classification error, and (2) the area under the ROC curve (AUC) [9], [11], [12]. ROC curves origi- nated from signal detection theory and are more frequently used in the case of one-class classification or classification with two classes, which is the case of our problem [18][8]. In the ROC curve, the x-axis represents the PFA (Prob- ability of False Alarm), which identifies normal patterns wrongly classified as novelties; the y-axis represents the PD (Probability of Detection), which identifies the likelihood of patterns of the novelty class being recognized correctly. The area under the ROC curve (AUC) summarizes the ROC curve and is another way to compare classifiers other than the accuracy, according to Huang and Ling [18]. In com- parison with others classifiers, the best classifier is the one that obtains an AUC more close to 1. Aiming to select the attributes from the dataset with greater significance to the problem we have used In- foGainAttributeEval, as the attribute evaluator, with the search method Ranker. The InfoGainAttributeEval evalu- ates the worth of an attribute by measuring the information gain with respect to the class. The Ranker ranks attributes by their individual evaluations using a threshold by which attributes can be discarded. For our experiments we varied the thresholds by which attributes can be discarded from 10−4 to 10−1 . 4.1 Results and Discussion We carried out experiments aiming to analyze the per- formance for the different selected attributes (see table ??). Table 2 shows the results obtained using the whole input feature vector (15 input variables), that is, without feature selection. In these experiment we have achieved a better re- sult with the MLP method, followed by the SVM (in terms of 10-fold cross-validation error). In terms of AUC, MLP have achieved better results, followed by kNN. For the decision trees, the results demonstrate that the parameter C has a great influence on the performance of the classifier, whereas the error has increased 5.01% from the C = 0.25 to C = 0.001. For C = 0.25, the decision tree has created 78 nodes while the other decision tree using C = 0.001 has created only 5 nodes. Fig. 1 shows the simple model created by the C4.5 algorithm for C = 0.001 without feature selection. These results match with AUC results, for C = 0.25 AUC is better than the AUC for C = 0.001. Among all the experiments carried out using feature se- lection the best results were found by the InfoGainAttribu- teEval threshold = 10−4 , which means using only two input attributes. The two attributes selected by InfoGainAttribu- teEval were age in months and opinion of the responsible about the oral health of the child. Table 3 shows the results obtained using the InfoGainAt-
  • 4. Table 2. Caries prediction results without feature selection (15 input attributes) Classifier 10-fold cross-validation error AUC kNN(k = 19) 26.75% 0.8178 C4.5 (C = 0.25) 25.95% 0.7985 C4.5 (C = 0.001) 30.96% 0.7193 MLP (hidden layer units = 2, learning rate = 0.01, epochs = 500) 22.75% 0.8452 SVM (C = 1, σ = 0.1) 23.65% 0.7635 Figure 1. Decision Tree for C = 0.001. tributeEval threshold = 10−4 . With only two attributes we have improved the results obtained by kNN and decision trees. Conversely, the results of the MLP and SVM meth- ods were inferior to those with 15 input variables. In these experiments we, as in the experiments without feature se- lection, have achieved a better result with the MLP method, followed by the kNN in terms of both performance criteria, namely, the classification error and the AUC value. Using feature selection the performance of both decision trees models achieved a discrete performance improvement. As a multidisciplinary work, this paper have chosen deci- sion trees as one of the methods to treat this problem by its ability to rules extraction of the problem. For a dentist it is easier to use the results provided by decision trees than to use the results of classifiers such as MLPs, which are harder to interpret. 5 Conclusion The early childhood caries is considered a public health problem which occurs often in children of social-economic groups less privileged. In this work we have compared the performance of four different classifiers applied to the prob- lem of caries prediction. For this problem, we also per- formed a feature selection in the dataset aiming to retrieve the attributes more relevant to the task of caries prediction. The results have shown that the best model for caries prediction was obtained by MLP Neural Networks, which achieved a 10-fold cross validation error rate of 22.75%, without feature selection. Using the InfoGainAttributeEval as feature selection method, the MLP and SVM methods had a discrete performance loss whereas the decision trees (C = 0.001 and C = 0.25) and the kNN achieved a discrete improvement in their performance. From the results obtained in this work we can see that children with ages from twenty three months are more caries prone. The results also show that the family income, if the child had already a toothache and if the child had al- ready a caries diagnoses, influences the occurrence of the disease. The results also show that children already diag- nosed as caries carrier has presented recurrence; this makes us conclude that the treatment is not achieving a needed ef- ficiency in the reeducation of the child’s oral hygiene. References [1] V. D. S. A. Advanced support vector machines and kernel methods. Neurocomputing, 55(1-2):5–20, 2003. [2] S. R. Bhatikar, C. DeGroff, and R. L. Mahajan. A classi- fier based on the artificial neural network approach for car- diologic auscultation in pediatrics. Artificial Intelligence in Medicine, 33(3):251–260, 2005. [3] T.-C. Chen and T.-C. Hsu. A GAs based approach for min- ing breast cancer pattern. Expert Syst. Appl, 30(4):674–681, 2006. [4] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:1–25, 1995. [5] N. Cristianini and J. Shawe-Taylor. An Introduction to Sup- port Vector Machines. Cambridge University Press, 2000. [6] A. L. I. de Oliveira, C. Baldisserotto, and J. Baldisserotto. A comparative study on machine learning techniques for pre- diction of success of dental implants. In A. F. Gelbukh, A. de Albornoz, and H. Terashima-Mar´ın, editors, MICAI, volume 3789 of Lecture Notes in Computer Science, pages 939–948. Springer, 2005. [7] D. Delen, G. Walker, and A. Kadam. Predicting breast can- cer survivability: a comparison of three data mining meth- ods. Artificial Intelligence in Medicine, 34(2):113–127, 2005. [8] N. M. Farsi and F. S. Salama. Sucking habits in saudi chil- dren: prevalence, contributing factors and effects on the pri- mary dentition. Pediatr Dent, 19(1):28–33, 1997. [9] T. Fawcett. An introduction to ROC analysis. Pattern Recog- nition Letters, 27(8):861–874, June 2006. [10] J. Gama. Functional trees. Machine Learning, 55(3):219– 250, 2004.
  • 5. Table 3. Caries prediction results for InfoGainAttributeEval threshold = 10−4 (2 input attributes). Classifier 10-fold cross-validation error AUC kNN(k = 11) 24,65% 0.8136 C4.5 (C = 0.25) 25,15% 0.8011 C4.5 (C = 0.001) 29,76% 0.7458 MLP (hidden layer units = 2, learning rate = 0.01, epochs = 500) 24,75% 0.8223 SVM (C = 100, σ = 0.1) 25,05% 0.7495 Figure 2. Decision Tree for C = 0.25 with feature selection and InfoGainAttributeEval threshold = 10−4 . [11] J. Huang and C. X. Ling. Using AUC and accuracy in eval- uating learning algorithms. IEEE Trans. Knowl. Data Eng, 17(3):299–310, 2005. [12] T. A. Lasko, J. G. Bhagwat, K. H. Zou, and L. Ohno- Machado. The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical In- formatics, 38(5):404–415, 2005. [13] N. Lavraˇc. Machine learning for data mining in medicine. In W. Horn, Y. Shahar, G. Lindberg, S. Andreassen, and J. Wy- att, editors, Proceedings of the Joint European Conference on Artificial Intellingence in Medicine and Medical Decision Making (AIMDM-99), volume 1620 of LNAI, pages 47–64, Berlin, June 20–24 1999. Springer. [14] Y. Lu, H. Guo, and L. Feldkamp. Robust neural learning from unbalanced data samples. In IEEE International Con- ference on Neural Networks (IJCNN’98), volume III, pages III–1816–III–1821, Anchorage, AK, July 1998. IEEE. [15] W. P. W. S. McCulloch. A logical calculus of ideas im- manent in nervous activity. Bulletin of Mathematical Bio- physics, 5:115–133, 1943. [16] D. Meyer, F. Leisch, and K. Hornik. The support vector ma- chine under test. Neurocomputing, 55(1-2):169–186, 2003. [17] B. A. Mobley, E. Schechter, W. E. Moore, P. A. McKee, and J. E. Eichner. Neural network predictions of significant coronary artery stenosis in men. Artificial Intelligence in Medicine, 34(2):151–161, 2005. [18] A. L. I. Oliveira, C. Baldisserotto, and J. Baldisserotto. A comparative study on support vector machine and construc- tive RBF neural network for prediction of success of den- tal implants. In A. Sanfeliu and M. Lazo-Cort´es, editors, CIARP, volume 3773 of Lecture Notes in Computer Science, pages 1015–1026. Springer, 2005. [19] J. R. Quinlan. Induction of decision trees. In J. W. Shavlik and T. G. Dietterich, editors, Readings in Machine Learning. Morgan Kaufmann, 1990. Originally published in Machine Learning 1:81–106, 1986. [20] J. R. Quinlan. C4.5: Programs for Machine Learning. Mor- gan Kaufmann, San Mateo, CA., 1993. [21] S. Reisine and J. Douglass. Jm. psychosocial and behavioral issues in early childhood caries, 1998. [22] A. Rosenblatt and A. Zarzar. Breast feeding and early child- hood caries: an assessment among brazilian infants. Inter- national Journal of Paediatric Dentistry, 14:439–450, 2004. [23] J. S. Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. [24] A. Webb. Statistical Pattern Recognition. Wiley, 2002. [25] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, 2000. [26] I. H. Witten and E. Frank. Data mining: practical machine learning tools and techniques with Java implementations. SIGMOD, 31(1):76–77, Mar. 2002.