Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for Student�s Absenteeism at the Undergraduate Level in a College for an Academic Year
Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining can be applied on the educational data for predicting the students behavior. This paper focuses on finding thesuitablealgorithm which yields the best result to find out the reason behind students absenteeism in an academic year. The first step in this processis to gather students data by using questionnaire.The datais collected from 123 under graduate students from a private college which is situated in a semirural area. The second step is to clean the data which is appropriate for mining purpose and choose the relevant attributes. In the final step, three different Decision tree induction algorithms namely, ID3(Iterative Dichotomiser), C4.5 and CART(Classification and Regression Tree)were applied for comparison of results for the same data sample collected using questionnaire. The results were compared to find the algorithm which yields the best result in predicting the reason for student s absenteeism.
Economic Growth of Information Technology (It) Industry on the Indian Economyijcnes
Mais conteúdo relacionado
Semelhante a Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for Student�s Absenteeism at the Undergraduate Level in a College for an Academic Year
Semelhante a Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for Student�s Absenteeism at the Undergraduate Level in a College for an Academic Year (20)
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for Student�s Absenteeism at the Undergraduate Level in a College for an Academic Year
1. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
62
Perfomance Comparison of Decsion Tree
Algorithms to Findout the Reason for Student’s
Absenteeism at the Undergraduate Level in a
College for an Academic Year
G. Suresh1
, K. Arunmozhi Arasan2
, S. Muthukumaran3
1
Assistant Professor, PG and Research Dept. of Computer Applications, St. Joseph’s College of Arts and Science
(Autonomous), Cuddalore.
2
HOD and Assistant Professor, Department of Computer Science and Applications, Siga College of Management and Computer
Science, Villupuram.
3
Assistant Professor, Department of Computer Science and Applications, Siga College of Management and Computer Science,
Villupuram.
Email:sureshg2233@yahoo.co.in, arunlucks@yahoo.co.in, muthulecturer@rediffmail.com
Abstract- Educational data mining is used to study the data
available in the educational field and bring out the hidden
knowledge from it. Classification methods like decision trees,
rule mining can be applied on the educational data for
predicting the students behavior. This paper focuses on finding
thesuitablealgorithm which yields the best result to find out the
reason behind students absenteeism in an academic year. The
first step in this processis to gather students data by using
questionnaire.The datais collected from 123 under graduate
students from a private college which is situated in a semi-
rural area. The second step is to clean the data which is
appropriate for mining purpose and choose the relevant
attributes. In the final step, three different Decision tree
induction algorithms namely, ID3(Iterative Dichotomiser),
C4.5 and CART(Classification and Regression Tree)were
applied for comparison of results for the same data sample
collected using questionnaire. The results were compared to
find the algorithm which yields the best result in predicting the
reason for student s absenteeism.
Keywords: Data Mining, Decision Tree Induction, ID3, C4.5
and CARTalgorithm.
I. INTRODUCTION
Currently many educational institutions especially small-
medium education institutions are facing problems with the
lack of attendance among the students[1]. The students who
possess less than 80% percentage of attendance will not be
permitted by the concerned universities to appear for the
semester exams. In the recent years all educational institutions
are facing this lack of attendance problem[2]. Hence,this
research aims to find suitable decision tree algorithm in
predicting the reason for student lack of attendance.
II. LITERATURE SURVEY
Ekkachai Naenudorn and Jatsada Singthongchaip
resented[3,4] their study on student recruiting on higher
education institutions. The objectives of this study are to test
the validity of the model derived fromdecision rules and to
find the right algorithm for data classification task. From
comparison of 4 algorithms; J48,Id3, Naïve Bayes and OneR,
with the goal to predict the features of students who are likely
to undergo thestudent admission process.Sunita B. Aher and
Lobo L.M.R.J[5]presented their paper in that they compare
the five classification algorithm to choose the best
classification algorithm for CourseRecommendation system.
These five classification algorithms are ADTree, Simple Cart,
J48, ZeroR& Naive Bays Classification Algorithm. They
compare these six algorithms using open source data mining
tool Weka& present the result[6].DursunDelen, Glenn Walker
and AmitKadam[7]presented a paper on predicting the breast
cancer survivability; they used two popular data mining
algorithms (artificial neural networks and decision trees) along
with a most commonly used logistic regression method to
develop the prediction models using a large data set[8]. They
also used 10-fold cross-validation methods for performance
comparison purposes and the results indicated that the decision
tree (C5) is the best predictor with 93.6% accuracy on the
holdout sample.
III. BACKGROUND KNOWLEDGE
Decision tree induction is the learning of decision trees from
class-labeled training tuples[9]. A decision tree is a flow chart
like tree structure, where each internal node(non-leaf node)
denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node(or terminal node)
holds a class label. The topmost node in a tree is the root node.
A. Decision Tree Induction Algorithm
During the late 1970s and early 1980s J.Ross Quinlan a
researcher in machine learning developed a decision tree
algorithm known as ID3(Iterative Dichotomiser)[10]. ID3
adopt a greedy(i.e. nonbacktracking) approach in which
decision trees are constructed in a top-down recursive divide-
and-conquer manner. A basic decision tree algorithm is
summarized below.
Algorithm: Generate decision tree. Generate a decision tree
from the training tuples of data partition D.
Input:
Data partition, D, which is a set of training tuples and their
associated class lables.
2. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
63
Attribute_list, the set of candidate attributes.
Attribute_selection_method, a procedure to determine the
splitting criterion that “best” partitions the data tuples into
individual classes. This criterion consists of a
splitting_attribute and possibly either a split point or
splitting subset.
Output: A decision tree
Method:
1. Create a node N:
2. If tuples in D are all of the same class, C then
3. Return N as a leaf node labeled with the class C
4. Ifattribute_list is empty then
5. Return N as a leaf node labeled with the majority class in
D;
//majority voting
6. Apply Attribute_selection_method
7. (D, attribute_list) to find the “best” splitting_criterion”;
8. Label node N with splitting_criterion;
9. If splitting_attribute is discrete-valued and multiway splits
allowed then
//not restricted to binary trees
10. Attribute_listattribute_list-splitting attribute;
//remove splitting_attribute
11. For each outcome j of splitting_criterion //partition the
tuples and grow subtrees for each partition
12. Let Dj be the set of data tuples in D satisfying outcome j;
//a partition
13. If Dj is empty then
14. Attach a leaf labeled with the majority class in D to node
N;
15. Else attach the node returned by
Generate_decision_tree(Dj,attribute_list)to node N;
End for
16. Return N;
B. Information Gain
Entropy:
Entropy[11] uses information gain as its attributes selection
measure. The attribute with highest information gain is chosen
as the splitting attribute for node N. The expected information
needed to classify an example in dataset D is given by
Info(D)= -∑ 𝑝𝑖𝑙𝑜𝑔2(𝑝𝑖)
𝑚
𝑖=1
Partitioning to produce an exact classification of the
examples was done by
InfoA(D)= ∑
|𝐷𝑗|
|𝐷|
𝑣
𝑗=1 * Info(Dj)
The term |D1|/|D| acts as the weight of the jth
partition Infoi(D)
is the expressed information required to classify an example
from dataset D based on the partitioning by A[12]. The
information gain is defined as the difference between the
original information requirement and the new requirement that
is.
Gain (A) =Info (D) -InfoA(D)
The attribute A with the highest information gain (Gain
(A)) is chosen as the splitting attribute at node N.
C. C4.5 Algorithm
GAIN RATIO
The Gain ratio[13] applies a kind of normalization to
information gain using a split information value defined
analogously with Info(D) as
SplitInfoA(D)=-∑
|𝐷𝑗|
|𝐷|
𝑣
𝑗=1 ∗ log2(
|𝐷𝑗|
|𝐷|
)
Gain ratio differs from Information gain, which measures the
information with respect to classification that is acquired
based on the same partitioning. The gain ratio is defined as
GainRatio(A)=
𝐺𝑎𝑖𝑛(𝐴)
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜(𝐴)
The attribute with the maximum gain ratio is selected as the
splitting attribute.
D. C4.5
The C4.5 is the extension of the ID3 algorithm[15]. It uses the
gain ratio as additional information to select the best attribute
to act as root node which has the maximum gain ratio.The
general algorithm for building the decision tree is:
Algorithm
1. Check for the base cases
2. For each attribute a
A). Find the normalized information ratio from splitting
on a
3. Let a-best be the attribute with the highest
normalized information gain.
4. Create a decision node that splits on a-best.
5. Recursively on the sub lists obtained by splitting on a-
best, and add those nodes as children of node.
E. Classification and Regression Tree(Cart)
Classification and Regression Trees (CART) was
introduced by Breiman et al. It is a statistical technique that
can select from a large number of explanatory variables(x)
those that are most important in determining the response
variable (y) to be explained[16]. The CART steps can be
summarized as follows.
1. Assign all objects to root node.
2. Split each explanatory variable at all its possible split
points (that it is in between all the values observed for that
variable in the considered node.)
3. For each split point, split the parent node into two child
nodes by separating the objects with values lower and
higher than the split point for the considered explanatory
variable.
4. Select the variable and split point with the highest
reduction of impurity.
5. Perform the split of the parent node into two child nodes
according to the selected split point.
6. Repeat steps 2-5, using each node as a new parent node,
until the tree has maximum size.
7. Prune the tree back using cross-validation to select the
optimal sized tree.
For a node with n objects the impurity is then defined as:
Impurity=∑ (𝑦𝑖 − 𝑦
̅)
𝑛
𝑖=1
2
The Gini index of a node with n objects and c possible classes
is defined as:
Gini= 1- ∑ (
𝑛𝑗
𝑛
)
𝑐
𝑗=1
2
IV. DATA COLLECTION AND SYSTEM
FRAMEWORK
3. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
64
The data are collected from a private college at Ulundurpet in
Villupuram district. Among the 123 students 85 were male and
38 were female candidates.
Figure 1: System FrameWork
A. Experimental Setup and Evaluation
Tanagra is open source software used for data mining. It
supports all the basic type of data formats like *.xls, *.txt etc.
and it is very user friendly[17]. The data set is implemented in
Tanagra.The analysis of the data is done on the basis of gender
i.e. the reason why a male student take leave and the reason
why a female student take leave to the college by applying
Entropy and Information gain
Table 1:Total records in our dataset
MALE FEMALE TOTAL
85 38 123
Info(D)= -∑ 𝑝𝑖𝑙𝑜𝑔2(𝑝𝑖)
𝑚
𝑖=1
= (
85
123
log2
85
123
-
38
123
log2
38
123
)
=0.892
Table 2:Total records for the attribute Job
JOB MALE FEMALE TOTAL
Strong
agree
25 1 26
Agree 28 2 30
Neutral 6 8 14
Disagree 14 13 27
Strong
Disagree
12 14 26
TOTAL 85 38 123
The gain for the attribute part time job is
InfoA(D)= ∑
|𝐷𝑗|
|𝐷|
𝑣
𝑗=1 * Info(Dj)
=
26
123
(
25
26
log2
25
26
-
1
26
log2
1
26
) +
30
123
(
28
30
log2
28
30
-
2
30
log2
2
30
)
+
14
123
(
6
14
log2
6
14
-
8
14
log2
8
14
) +
27
123
(
14
27
log2
14
27
-
13
27
log2
13
27
)
+
26
123
(
12
26
log2
12
26
-
14
26
log2
14
26
)
=0.050+0.086+0.112+0.219+0.210
=0.678
Gain(A)=Info(D)- InfoA(D)
=0.892-0.678
=0.214
The ID3 algorithm uses the entropy and information gain
values to select the root node. The information gain for the
attribute job has the highest value in the database and the rest
of the attributes has the least values.
‘
Figure 2:Decision Tree(ID3) in Tanagra
The job attribute is taken as the root node of the tree and the
123 records are split by the branches of the job as strongly
agree 26 records and Agree 30 records and Neutral 14 records
and Disagree 27 records and Strongly Disagree 26 records and
the process is applied for the each table and decision tree is
formed as shown in figure3.C4.5 algorithm uses an extension
to information known as gain ratio[18] and it applies a kind of
normalization to information gain using a Split information
value. The split information is calculated as below,
SplitInfoA(D) = - ∑
|𝐷𝑗|
|𝐷|
𝑣
𝑗=1 ∗ log2(
|𝐷𝑗|
|𝐷|
)
= -
26
123
∗ log2(
26
123
) -
30
123
∗ log2(
30
123
) -
14
123
∗ log2(
14
123
) -
27
123
∗ log2 (
27
123
) -
26
123
∗ log2(
26
123
)
= 04740+0.4965+0.3568+0.4802+0.4740
= 2.2815
The Gain Ratio for the part time job attribute is
GainRatio(A)=
𝐺𝑎𝑖𝑛(𝐴)
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜(𝐴)
=
0.214
2.2815
=0.0938
The results obtained from Tanagra using C4.5 algorithm is
shown in the figure below. Here job attribute is taken as the
rootnodebecauseit has the maximum gain ratio and it is used
as the root node for the above decision tree.
Figure 3:Decision Tree(C4.5) in Tanagra
Gini index is used in the CART algorithm. The Giniindex
value for the job attribute is calculated as below.
Gini= 1- ∑ (
𝑛𝑗
𝑛
)
𝑐
𝑗=1
2
Gini(D) =1- (
85
123
)
2
-(
38
123
)
2
= 0.428
4. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
65
Figure 4:Decision Tree(CART) in Tanagra
The Gini for the attribute part time job is
=(
26
123
) [1 − (
25
26
)
2
− (
1
26
)
2
] + (
30
123
) [1 − (
28
30
)
2
− (
2
30
)
2
] +
(
14
123
) [1 − (
6
14
)
2
− (
8
14
)
2
] + (
27
123
) [1 − (
14
27
)
2
− (
13
27
)
2
] +
(
26
123
) [1 − (
12
26
)
2
− (
14
26
)
2
] +
=0.0156+0.0302+0.0553+0.1096+0.1050
=0.3157
Gini(job) = 0.428-0.3157 = 0.1123
The attribute that maximizes the reduction impurity (or,
equivalently, has the minimum Gini index) is selected as the
splitting attribute. Here, job attribute has the minimum Gini
index value in the database.
B. Performance Measures
The confusion matrix is a tool for the analysis of a classifier.
Given m classes, a confusion matrix is a table of at least size
m by m. An entry CMi, j in the first m rows and m columns
indicates the number of tuples of classithat were labelled by
the classifier as class j. For a classifier to have good accuracy,
ideally most of the tupleswould be represented along the
diagonal of the confusion matrix, from entry CM1,1 to entry
CMm,m with the rest of the entries being to close to zero.
Table 3: Confusion Matrix
The performance of the ID3, C4.5 and CART decision tree
classification algorithms were evaluated by
Table 4: Confusion Matrix of C4.5 for our dataset
the accuracy % on the student leave data set[19]. The accuracy
of a classifier on a given test set is the percentage of test set
tuples that are correctly classified by the classifier.
Accuracy (M)=
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
For our data set usingC4.5 algorithm we have,
Accuracy (M) =
81+29
81+29+9+4
= 0.8943
A true positive(TP) occurs when a classifier correctly
classified class1. A true negative (TN) occurs when a classifier
correctly classified class2. A false positive (FP) occurs when a
classifier incorrectly classified class1. A false negative (FN)
occurs when a classifier incorrectly classified class2.
Error Rate = 1- Accuracy (M)
For our data set usingC4.5 algorithm we have,
Error Rate = 1 - 0.8943 = 0.1057
The sensitivity measures the proportion of the actual positives
which are correctly identified.
Sensitivity =
𝑇𝑃
𝑇𝑃+𝐹𝑁
For our data set usingC4.5 algorithm we have,
Sensitivity =
81
81+4
= 0.9529
The specificity measures the proportion of negatives which are
correctly identified.
Specificity =
𝑇𝑁
𝑇𝑁+𝐹𝑃
For our data set usingC4.5 algorithm we have,
Specificity =
29
29+9
= 0.7632
Precision is the percentage of true positives (TP) compared to
the total number of cases classified as positive events[13].
Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
1-Precision = 1 −
𝑇𝑃
𝑇𝑃+𝐹𝑃
For our data set usingC4.5 algorithm we have,
Precision =
81
81+9
= 0.9000
1-Precision = 1- 0.9000 = 0.1000
The following table values are calculated by using the
confusion matrix obtained from Tanagra tool.
Table 5: Comparison values of the three Decision Tree
Algorithms
Measures ID3 C-RT C4.5
Accuracy 0.7317 0.8211 0.8943
Error Rate 0.2683 0.1789 0.1057
Recall 0.8706 0.9176 0.9529
Specificity 0.4211 0.6053 0.7632
Precision 0.7708 0.8387 0.9000
1-Precision 0.2292 0.1613 0.1000
The table4 shows that the C4.5 algorithm has the highest
accuracy rate (89.43%) and lowest error rate (10.57%) among
the three algorithms. The ID3 and C4.5 has the accuracy rates
73.17% and 82.11% respectively.
C1 C2
C1
C2
True Positives
False Positives
False Negatives
True Negatives
Male Female sum
Male 81 4 85
Female 9 29 38
90 33 123
5. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
66
The C4.5 algorithm gave the highest specificity value which is
the 0.7632% of negative class tuples correctly classified and
the highest sensitivity value which is the 0.9529% of positive
class tuples correctly classified.
C. Rule Extraction From C4.5 Decision Tree
To extract rules from a decision tree, one rule is created for
each path from the root to a leaf node[20]. Each splitting
criterion along a given path is logically ANDed to form the
rule antecedent (IF part). The leaf node holds the class
prediction forming the rule consequent (THEN part).
IF condition THEN conclusion
Using the above method we extract the rules obtained from the
decision tree using the C4.5 algorithm.
R1:If(job=strongly agree)→(gender=male) [26/123]
Coverage (R1) =26/123=21.13 %
R2:If(job=agree)→(gender=male)[30/123]
Coverage (R2) =30/123=24.39 %
R3:If(job=neutral)˄(staff problem)→(geder=male)[6/123]
Coverage (R3) =6/123=4.87 %
R4:If(job=neutral)˄(staff problem)→(geder=female)[8/123]
Coverage (R4) =8/123=6.5 %
R5:If(job=disagree)˄(occasionaly)→(geder=male) [12/123]
Coverage (R5) =12/123=9.75 %
R6:If(job=disagree)˄(occasionaly)→(geder=female) [15/123]
Coverage (R1) =15/123=12.19 %
R7:If(job=strongly disagree)˄(accident)→(geder=male)
[16/123]
Coverage (R1) =16/123=13 %
R6:If(job= strongly disagree)˄(accident)→(geder=female)
[10/123]
Coverage (R1) =10/123=8.13 %
D. Decision Tree Obtained Using Tanagra Tool
Table 6: Decision Tree obtained from Tanagra tool
S.NO. NAME OF
THE
ALGORITHM
DECISION TREE OBTAINED FROM TANAGRA TOOL
1. ID3 Job in [B]
o examstudy in [A] then Gender = A (85.71 % of 7 examples)
o examstudy in [C] then Gender = A (0.00 % of 0 examples)
o examstudy in [B] then Gender = A (100.00 % of 10 examples)
o examstudy in [D] then Gender = A (100.00 % of 11 examples)
o examstudy in [E] then Gender = A (50.00 % of 2 examples)
Job in [E]
o Fess in [B] then Gender = B (100.00 % of 2 examples)
o Fess in [E] then Gender = A (54.55 % of 11 examples)
o Fess in [A] then Gender = B (54.55 % of 11 examples)
o Fess in [C] then Gender = A (50.00 % of 2 examples)
o Fess in [D] then Gender = A (0.00 % of 0 examples)
Job in [A]
o Dept in [BCA] then Gender = A (93.33 % of 15 examples)
o Dept in [BBA] then Gender = A (100.00 % of 10 examples)
o Dept in [BCOM] then Gender = A (100.00 % of 1 examples)
Job in [C] then Gender = B (57.14 % of 14 examples)
Job in [D] then Gender = A (51.85 % of 27 examples)
2. C4.5 Job in [B] then Gender = A (93.33 % of 30 examples)
Job in [E]
o Accident in [E] then Gender = B (85.71 % of 7 examples)
o Accident in [B] then Gender = A (100.00 % of 2 examples)
o Accident in [D] then Gender = B (100.00 % of 1 examples)
o Accident in [A] then Gender = A (64.29 % of 14 examples)
o Accident in [C] then Gender = B (100.00 % of 2 examples)
Job in [A] then Gender = A (96.15 % of 26 examples)
Job in [C]
o Stafproblm in [E] then Gender = A (83.33 % of 6 examples)
o Stafproblm in [A] then Gender = B (100.00 % of 1 examples)
o Stafproblm in [C] then Gender = A (0.00 % of 0 examples)
o Stafproblm in [D] then Gender = B (85.71 % of 7 examples)
o Stafproblm in [B] then Gender = A (0.00 % of 0 examples)
Job in [D]
o Occasionaly in [E] then Gender = A (100.00 % of 6 examples)
o Occasionaly in [C] then Gender = A (100.00 % of 3 examples)
o Occasionaly in [D] then Gender = B (86.67 % of 15 examples)
6. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
67
o Occasionaly in [A] then Gender = A (100.00 % of 2 examples)
o Occasionaly in [B] then Gender = A (100.00 % of 1 examples)
3. CART Job in [B,A] then Gender = A (92.86 % of 26 examples)
Job in [E,C,D]
o misbus in [B,D] then Gender = B (83.33 % of 3 examples)
o misbus in [E,C,A] then Gender = A (63.64 % of 12 examples)
V. RESULTS AND DISCUSSION
From the above computation and decision tree obtained
using tanagra tool the following results were obtained using
C4.5 decision tree algorithm.
56 male students strongly agree that they take leave for
the reason of going to job.
6 male students and 8 female who were neutral about
going to job, agree that they take leave on the reason of
problems with the staff.
12 male and 15 female students who disagree of going to
job, agree that they take leave occasionaly for no reason.
16 male and 10 female students who strongly agree for the
reason of going to job also agree that they take leave if
they met with an accident.
When the same procedure is carried out to predict the reason
for student absenteism using ID3 and CART decsion tree
algorithms compared to C4.5, it was proved that C4.5
algorithm has relatively high performance.
VI. CONCLUSION
The results obtained from Tanagra clearly show that C4.5
algorithm has relatively high performance when compared
with the other two decision tree algorithms namely ID3 and
CART.Hence to achieve the best results for classifying a small
size dataset,C4.5 algorithm is the best choice among the other
two decision tree algorithms.
REFERENCES
[1] Waheed, T., Bonnell, R. B., Prasher, S. O., &Paulet, E. “Measuring
performance in precision agriculture: CART—A decision tree
approach”, Agricultural water management, Vol. 84, Issue 1, 2006, pp.
173-185.
[2] Naenudorn, E. K. K. A. C. H. A. I., Singthongchai, J. A. T. S. A. D. A.,
Kerdprasop, N. I. T. T. A. Y. A., &Kerdprasop, K. I. T. T. I. S. A. K.
“Classification model induction for student recruiting”. Latest Advances
in Educational Technologies, 2012, pp. 117-122.
[3] Aher, S. B., & Lobo, L. M. R. J.,.“Comparative Study Of Classification
Algorithms”. International Journal of Information Technology, Vol.5,
Issue 2, 2012, pp. 239-243.
[4] Hsieh, A. R., Hsiao, C. L., Chang, S. W., Wang, H. M., &Fann, C. S.,.
“On the use of multifactor dimensionality reduction (MDR) and
classification and regression tree (CART) to identify haplotype–
haplotype interactions in genetic studies”. Genomics,Vol9, Issue2,2012,
pp.77-85.
[5] Delen, D., Walker, G., &Kadam, A.,“Predicting breast cancer
survivability: a comparison of three data mining methods”. Artificial
intelligence in medicine, Vol34, Issue 2, 2005,pp.113-127.
[6] Kumar, S. A., &Vijayalakshmi, M. N.,.” Efficiency of decision trees in
predicting student's academic performance”, In First International
Conference on Computer Science, Engineering and Applications, CS and
IT , Vol. 2, 2011, pp. 335-343.
[7] Han, J., Kamber, M., & Pei, J.,“Data mining, southeast asia edition:
Concepts and techniques”, Morgan kaufmann, 2006.
[8] e Sun, H., “ Research on Student Learning Result System based on Data
Mining”, IJCSNS, Vol. 10, Issue 4, 2010, pp.203.
[9] Sehgal, L., Mohan, N., &Sandhu, P. S.,“Quality prediction of function
based software using decision tree approach”, In International
Conference on Computer Engineering and Multimedia Technologies
(ICCEMT),2012, pp. 43-47.
[10] Hsu, Chung-Chian, Yan-Ping Huang, and Keng-Wei Chang. "Extended
Naive Bayes classifier for mixed data." Expert Systems with
Applications, Vol. 35 Issue 3,2008, pp. 1080-1083.
[11] Questier, F.,"The use of CART and multivariate regression trees for
supervised and unsupervised feature selection." Chemometrics and
Intelligent Laboratory Systems, Vol. 76, Issue 1, 2005, pp. 45-54.
[12] Zighed, Djamel A., "Decision trees with optimal joint partitioning.",
International journal of intelligent systemsVol.20, Issue 7, 2005, pp.
693-718.
[13] Deconinck, Eric, "Classification trees based on infrared spectroscopic
data to discriminate between genuine and counterfeit medicines.",Journal
of pharmaceutical and biomedical analysis, Vol. 57, 2012, pp.68-75.
[14] Sanders, Peter., "Analysis of nearest neighbor load balancing algorithms
for random loads." Parallel Computing, Vol. 25, Issue 8, 1999,pp.1013-
1033.
[15]Garcı, Salvador,"Evolutionary-based selection of generalized instances
for imbalanced classification." Knowledge-Based Systems, Vol. 25,
Issue 1, 2012, pp. 3-12.
[16] Mohamed, Marghny H. "Rules extraction from constructively trained
neural networks based on genetic algorithms.",Neurocomputing, Vol.74,
Issue 17, 2011, pp. 3180-3192.
7. Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.62- 68
ISSN: 2278-2400
68
[17] Wang, L. M., Li, X. L., Cao, C. H., & Yuan, S. M.,“Combining decision
tree and Naive Bayes for classification”,Knowledge-Based
Systems,Vol.19, Issue 7, 2006, pp.511-515.
[18] Talaie, A., Esmaili, N., Taguchi, T., Romagnoli, J. A., &Talaie, F., “
Application of an engineering algorithm with software code (C4. 5) for
specific ion detection in the chemical industries”, Advances in
Engineering Software, Vol. 30, Issue 1, 1999, pp.13-19.
[19] Lu, S. H., Chiang, D. A., Keh, H. C., & Huang, H. H., “Chinese text
classification by the Naïve Bayes Classifier and the associative classifier
with multiple confidence threshold values”, Knowledge-based
systems,Vol .23, Issue 6, 2010, pp.598-604.
[20] Vlahou, Antonia, "Diagnosis of ovarian cancer using decision tree
classification of mass spectral data", BioMed Research
International,Vol.5, 2003, pp. 308-314.