SlideShare a Scribd company logo
1 of 14
Classification Algorithms
Decision Tree Induction
Bayesian Classification
Decision Tree Induction
• A decision tree is a flow-chart like structure, where each
internal node(non-leaf node) denotes a test on an attribute.
• Each branch represents an outcome of the test
• And each leaf node(terminal node) holds a class label.
• The topmost node in a tree is the root node.
Decision Tree Induction
Why are decision tree classifiers so
popular?
• It does not require any domain knowledge.
• Decision trees can handle multi-dimensional data.
• It is easy to comprehend.
• The learning and classification steps of a decision tree are
simple and fast.
Applications:
Applications of decision tree induction include
astronomy, financial analysis, medical diagnosis,
manufacturing and production, molecular biology.
Decision Tree Algorithms
• CART (Classification And Regression Trees)
• ID3 (Iterative Dichotomiser)
In the late 1970s and early 1980s, J.Ross Quinlan, a researcher
in machine learning developed a decision tree algorithm for
machine learning.
Later, he presented C4.5, which was the successor of ID3.
ID3 and C4.5 and CART adopt a greedy(non-backtracking)
approach in which decision trees are constructed in a top-
down recursive divide-and-conquer manner.
Decision Tree Algorithm
The strategy for the algorithm is as follows:
(1) The algorithm is called with three parameters: attribute list, attribute
selection method and data partition.
(2) Initially, data partition is the complete set of training tuples and their
associated class labels. The attribute list describes the attributes of the
training set tuples.
RID Age Student Credit_rati
ng
Buys
1 Youth Yes Fair Yes
2 Youth Yes Fair Yes
3 Youth Yes Fair No
4 Youth no Fair No
5 Middle No Excellent Yes
6 Senior Yes Fair No
Class
label
Decision Tree Algorithm
(3) The attribute selection method describes the method for selecting the
best attribute for discrimination among tuples. The methods used for
attribute selection can either be Information Gain or Gini Index. The
structure of the tree (binary or non-binary) is decided by the attribute
selection method.
(4) The tree starts as a single node representing the training tuples in data
partition.
Age
youth
middle
senior
RID class
1 Yes
2 Yes
3 No
4 no
RID class
5 yes
RID class
6 No
Decision Tree Induction
(5) If the tuples in the Data Partition are all of the same class, then node
becomes a leaf and is labeled with that class. (terminating condition)
(6) otherwise, the attribute selection method is called to determine the
splitting criterion.
(7) The algorithm uses the same process recursively to form a decision tree
for the tuples at each resulting partition.
(8) The recursive partitioning stops only when any one of the following
terminating conditions is true:
Decision Tree Induction
(i) all the tuples in partition belong to the same class.
(ii) There are no remaining attributes on which the tuples
may be further partitioned. In this case, majority voting is
employed. This involves converting node into a leaf and
labeling it with the most common class in partition.
(iii) There are no tuples for a given branch, in this case also,
a leaf is created with the majority class in partition.
(9) The resulting decision tree is returned.
Decision Tree Algorithm
Tree Pruning
• An attempt to improve accuracy.
• Tree pruning is performed in order to remove
anomalies the method to reduce the
unwanted branches of the tree. This will
reduce the complexity of the tree and help in
effective predictive analysis. It reduces the
overfitting as it removes the unimportant
branches from the trees.
Bayesian Classification
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ Theorem.
• Bayesian classifiers have also exhibited high accuracy and
speed when applied to large databases.
Bayes’ Theorem
• Bayes theorem is named after Thomas Bales who did early work in probability
and decision theory during 18th century.
• Let X be a data tuple. In bayesian terms X is considered as “evidence”. Let H
be hypothesis such that the data tuple belong to a specified class C.
• P(H|X) is the posterior probability that the hypothesis H holds the evidence or
data tuple X. Or, the probability that X belongs to a specified class C.
e.g. data tuples comprise of attributes, age and income. X is of 35 years with an
income of $40,000.
H is hypothesis that X will buy computer or not.
P(H|X) is the probability that X will buy computer given his age and income.
• P(H) is the prior probability.
e.g. probability that X will buy computer or not, regardless of age and income.
i.e. , P(H) is independent of X.
Bayes’ Theorem
• P(X|H) is the posterior probability (likelihood) that the customer X is of 35 years and earns
$40,000 given that we know that X will buy computer.
• P(H) is the prior probability (marginal).
e.g. probability that X is of 35years and earns $40,000, regardless he will buy computer or not.
Bayes’ Theorem is given by
P(H|X) =
e.g. P(Queen|Face) = P(face|queen) P(queen) / P(face)
= (1 * 4/52 ) / (12/52)
= 1/3
= 33.33%

More Related Content

Similar to Classification Algorithms

Induction of Decision Trees
Induction of Decision TreesInduction of Decision Trees
Induction of Decision Treesnep_test_account
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees Nikolaos Vergos
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...IOSRjournaljce
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontreetahseen shaikh
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentationVijay Yadav
 

Similar to Classification Algorithms (20)

Decision tree
Decision tree Decision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Induction of Decision Trees
Induction of Decision TreesInduction of Decision Trees
Induction of Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Lecture4.ppt
Lecture4.pptLecture4.ppt
Lecture4.ppt
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Classification Algorithms

  • 1. Classification Algorithms Decision Tree Induction Bayesian Classification
  • 2. Decision Tree Induction • A decision tree is a flow-chart like structure, where each internal node(non-leaf node) denotes a test on an attribute. • Each branch represents an outcome of the test • And each leaf node(terminal node) holds a class label. • The topmost node in a tree is the root node.
  • 4. Why are decision tree classifiers so popular? • It does not require any domain knowledge. • Decision trees can handle multi-dimensional data. • It is easy to comprehend. • The learning and classification steps of a decision tree are simple and fast. Applications: Applications of decision tree induction include astronomy, financial analysis, medical diagnosis, manufacturing and production, molecular biology.
  • 5. Decision Tree Algorithms • CART (Classification And Regression Trees) • ID3 (Iterative Dichotomiser) In the late 1970s and early 1980s, J.Ross Quinlan, a researcher in machine learning developed a decision tree algorithm for machine learning. Later, he presented C4.5, which was the successor of ID3. ID3 and C4.5 and CART adopt a greedy(non-backtracking) approach in which decision trees are constructed in a top- down recursive divide-and-conquer manner.
  • 6. Decision Tree Algorithm The strategy for the algorithm is as follows: (1) The algorithm is called with three parameters: attribute list, attribute selection method and data partition. (2) Initially, data partition is the complete set of training tuples and their associated class labels. The attribute list describes the attributes of the training set tuples. RID Age Student Credit_rati ng Buys 1 Youth Yes Fair Yes 2 Youth Yes Fair Yes 3 Youth Yes Fair No 4 Youth no Fair No 5 Middle No Excellent Yes 6 Senior Yes Fair No Class label
  • 7. Decision Tree Algorithm (3) The attribute selection method describes the method for selecting the best attribute for discrimination among tuples. The methods used for attribute selection can either be Information Gain or Gini Index. The structure of the tree (binary or non-binary) is decided by the attribute selection method. (4) The tree starts as a single node representing the training tuples in data partition. Age youth middle senior RID class 1 Yes 2 Yes 3 No 4 no RID class 5 yes RID class 6 No
  • 8. Decision Tree Induction (5) If the tuples in the Data Partition are all of the same class, then node becomes a leaf and is labeled with that class. (terminating condition) (6) otherwise, the attribute selection method is called to determine the splitting criterion. (7) The algorithm uses the same process recursively to form a decision tree for the tuples at each resulting partition. (8) The recursive partitioning stops only when any one of the following terminating conditions is true:
  • 9. Decision Tree Induction (i) all the tuples in partition belong to the same class. (ii) There are no remaining attributes on which the tuples may be further partitioned. In this case, majority voting is employed. This involves converting node into a leaf and labeling it with the most common class in partition. (iii) There are no tuples for a given branch, in this case also, a leaf is created with the majority class in partition. (9) The resulting decision tree is returned.
  • 11. Tree Pruning • An attempt to improve accuracy. • Tree pruning is performed in order to remove anomalies the method to reduce the unwanted branches of the tree. This will reduce the complexity of the tree and help in effective predictive analysis. It reduces the overfitting as it removes the unimportant branches from the trees.
  • 12. Bayesian Classification • Bayesian classifiers are statistical classifiers. • They can predict class membership probabilities such as the probability that a given tuple belongs to a particular class. • Bayesian classification is based on Bayes’ Theorem. • Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.
  • 13. Bayes’ Theorem • Bayes theorem is named after Thomas Bales who did early work in probability and decision theory during 18th century. • Let X be a data tuple. In bayesian terms X is considered as “evidence”. Let H be hypothesis such that the data tuple belong to a specified class C. • P(H|X) is the posterior probability that the hypothesis H holds the evidence or data tuple X. Or, the probability that X belongs to a specified class C. e.g. data tuples comprise of attributes, age and income. X is of 35 years with an income of $40,000. H is hypothesis that X will buy computer or not. P(H|X) is the probability that X will buy computer given his age and income. • P(H) is the prior probability. e.g. probability that X will buy computer or not, regardless of age and income. i.e. , P(H) is independent of X.
  • 14. Bayes’ Theorem • P(X|H) is the posterior probability (likelihood) that the customer X is of 35 years and earns $40,000 given that we know that X will buy computer. • P(H) is the prior probability (marginal). e.g. probability that X is of 35years and earns $40,000, regardless he will buy computer or not. Bayes’ Theorem is given by P(H|X) = e.g. P(Queen|Face) = P(face|queen) P(queen) / P(face) = (1 * 4/52 ) / (12/52) = 1/3 = 33.33%