SlideShare a Scribd company logo
1 of 15
ISSUES IN DECISION TREE LEARNING
Practical issues in learning decision trees include
1. Determining how deeply to grow the decision tree,
2. Handling continuous attributes,
choosing an appropriate attribute selection measure,
3. Handling training data with missing attribute values,
4. Handling attributes with differing costs, and
improving computational efficiency.
1. AVOIDING OVERFITTING THE DATA
 When we are designing a machine learning model, a
model is said to be a good machine learning model, if it
generalizes any new input data from the problem domain
in a proper way.
 This helps us to make predictions in the future data, that
data model has never seen.
 Underfitting
 A machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
 Underfitting destroys the accuracy of our machine learning
model.
 Its occurrence simply means that our model or the algorithm
does not fit the data well enough.
 It usually happens when we have less data to build an
accurate model and also when we try to build a linear model
with a non-linear data.
 Overfitting
 A machine learning algorithm is said to be overfitted, when
we train it with a lot of data.
 When a model gets trained with so much of data, it starts
learning from the noise and inaccurate data entries in our
data set.
 Then the model does not categorize the data correctly,
because of too much of details and noise.
 A solution to avoid over fitting is using a linear algorithm if
we have linear data or using the parameters like the
maximal depth if we are using decision trees.
 Definition — Overfit: Given a hypothesis space H, a
hypothesis h ∈ H is said to overfit the training data if there
exists some alternative hypothesis h’ ∈ H, such that h has
smaller error than h’ over the training examples, but h’ has
a smaller error than h over the entire distribution of
instances.
 Lets try to understand the effect of adding the following
positive training example, incorrectly labeled as negative, to
the training examples Table.
 <Sunny, Hot, Normal, Strong, ->, Example is noisy because
the correct label is +.
Given the original error-free data, ID3 produces the
decision tree shown in Figure.
AVOIDING OVER FITTING
 There are several approaches to avoiding overfitting in
decision tree learning. These can be grouped into two
classes:

- Pre-pruning (avoidance): Stop growing the tree
earlier, before it reaches the point where it perfectly
classifies the training data

- Post-pruning (recovery): Allow the tree to overfit the
data, and then post-prune the tree
 Criterion used to determine the correct final tree
size
 Use a separate set of examples, distinct from the
training examples, to evaluate the utility of post-
pruning nodes from the tree
 Use all the available data for training, but apply a
statistical test to estimate whether expanding (or
pruning) a particular node is likely to produce an
improvement beyond the training set
1. REDUCED ERROR PRUNING
 How exactly might we use a validation set to prevent
overfitting? One approach, called reduced-error pruning
(Quinlan 1987), is to consider each of the decision nodes in the
tree to be candidates for pruning.
 Reduced-error pruning, is to consider each of the decision
nodes in the tree to be candidates for pruning
 Pruning a decision node consists of removing the subtree
rooted at that node, making it a leaf node, and assigning it the
most common classification of the training examples affiliated
with that node
 Nodes are removed only if the resulting pruned tree performs
no worse than-the original over the validation set.
 Reduced error pruning has the effect that any leaf node added
due to coincidental regularities in the training set is likely to be
pruned because these same coincidences are unlikely to occur
in the validation set
2. RULE POST-PRUNING
 Rule post-pruning involves the following steps:
 Infer the decision tree from the training set, growing the
tree until the training data is fit as well as possible and
allowing over fitting to occur.
 Convert the learned tree into an equivalent set of rules by
creating one rule for each path from the root node to a
leaf node.
 Prune (generalize) each rule by removing any
preconditions that result in improving its estimated
accuracy.
 Sort the pruned rules by their estimated accuracy, and
consider them in this sequence when classifying
subsequent instances.
THERE ARE THREE MAIN ADVANTAGES BY CONVERTING
THE DECISION TREE TO RULES BEFORE PRUNING
 Converting to rules allows distinguishing among the
different contexts in which a decision node is used.
 Because each distinct path through the decision tree
node produces a distinct rule, the pruning decision
regarding that attribute test can be made differently for
each path.
 Converting to rules removes the distinction between
attribute tests that occur near the root of the tree and
those that occur near the leaves.
 Thus, it avoid messy bookkeeping issues such as how to
reorganize the tree if the root node is pruned while
retaining part of the subtree below this test.
 Converting to rules improves readability. Rules are often
easier for to understand.
2. INCORPORATING CONTINUOUS-VALUED
ATTRIBUTES
 Our initial definition of ID3 is restricted to attributes
that take on a discrete set of values.
 1. The target attribute whose value is predicted by
learned tree must be discrete valued.
 2. The attributes tested in the decision nodes of the
tree must also be discrete valued.
 This second restriction can easily be removed so
that continuous-valued decision attributes can be
incorporated into the learned tree.
3. ALTERNATIVE MEASURES FOR SELECTING
ATTRIBUTES
 There is a natural bias in the information gain measure
that favours attributes with many values over those with
few values.
 As an extreme example, consider the attribute Date, which
has a very large number of possible values. What is wrong
with the attribute Date?
 Simply put, it has so many possible values that it is bound
to separate the training examples into very small subsets.
 Because of this, it will have a very high information gain
relative to the training examples.
 How ever, having very high information gain, its a very
poor predictor of the target function over unseen
instances.
4. HANDLING MISSING ATTRIBUTE VALUES
 In certain cases, the available data may be missing
values for some attributes.
 For example, in a medical domain in which we wish
to predict patient outcome based on various
laboratory tests, it may be that the Blood-Test-
Result is available only for a subset of the patients.
 In such cases, it is common to estimate the missing
attribute value based on other examples for which
this attribute has a known value.
5. HANDLING ATTRIBUTES WITH DIFFERING
COSTS
 In some learning tasks the instance attributes may
have associated costs.
 For example, in learning to classify medical
diseases we might describe patients in terms of
attributes such as Temperature, BiopsyResult,
Pulse, BloodTestResults, etc.
 These attributes vary significantly in their costs,
both in terms of monetary cost and cost to patient
comfort.

More Related Content

What's hot

Extreme learning machine:Theory and applications
Extreme learning machine:Theory and applicationsExtreme learning machine:Theory and applications
Extreme learning machine:Theory and applications
James Chou
 

What's hot (20)

Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
Back propagation
Back propagationBack propagation
Back propagation
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Harmony search algorithm
Harmony search algorithmHarmony search algorithm
Harmony search algorithm
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learning
 
Max net
Max netMax net
Max net
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data Mining
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Extreme learning machine:Theory and applications
Extreme learning machine:Theory and applicationsExtreme learning machine:Theory and applications
Extreme learning machine:Theory and applications
 

Similar to Issues in DTL.pptx

Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
butest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
kevinlan
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 

Similar to Issues in DTL.pptx (20)

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
 

More from Ramakrishna Reddy Bijjam

More from Ramakrishna Reddy Bijjam (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Issues in DTL.pptx

  • 1. ISSUES IN DECISION TREE LEARNING Practical issues in learning decision trees include 1. Determining how deeply to grow the decision tree, 2. Handling continuous attributes, choosing an appropriate attribute selection measure, 3. Handling training data with missing attribute values, 4. Handling attributes with differing costs, and improving computational efficiency.
  • 2. 1. AVOIDING OVERFITTING THE DATA  When we are designing a machine learning model, a model is said to be a good machine learning model, if it generalizes any new input data from the problem domain in a proper way.  This helps us to make predictions in the future data, that data model has never seen.
  • 3.  Underfitting  A machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data.  Underfitting destroys the accuracy of our machine learning model.  Its occurrence simply means that our model or the algorithm does not fit the data well enough.  It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data.
  • 4.  Overfitting  A machine learning algorithm is said to be overfitted, when we train it with a lot of data.  When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set.  Then the model does not categorize the data correctly, because of too much of details and noise.  A solution to avoid over fitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
  • 5.  Definition — Overfit: Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if there exists some alternative hypothesis h’ ∈ H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances.  Lets try to understand the effect of adding the following positive training example, incorrectly labeled as negative, to the training examples Table.  <Sunny, Hot, Normal, Strong, ->, Example is noisy because the correct label is +. Given the original error-free data, ID3 produces the decision tree shown in Figure.
  • 6.
  • 7. AVOIDING OVER FITTING  There are several approaches to avoiding overfitting in decision tree learning. These can be grouped into two classes:  - Pre-pruning (avoidance): Stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data  - Post-pruning (recovery): Allow the tree to overfit the data, and then post-prune the tree
  • 8.  Criterion used to determine the correct final tree size  Use a separate set of examples, distinct from the training examples, to evaluate the utility of post- pruning nodes from the tree  Use all the available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set
  • 9. 1. REDUCED ERROR PRUNING  How exactly might we use a validation set to prevent overfitting? One approach, called reduced-error pruning (Quinlan 1987), is to consider each of the decision nodes in the tree to be candidates for pruning.  Reduced-error pruning, is to consider each of the decision nodes in the tree to be candidates for pruning  Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node  Nodes are removed only if the resulting pruned tree performs no worse than-the original over the validation set.  Reduced error pruning has the effect that any leaf node added due to coincidental regularities in the training set is likely to be pruned because these same coincidences are unlikely to occur in the validation set
  • 10. 2. RULE POST-PRUNING  Rule post-pruning involves the following steps:  Infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing over fitting to occur.  Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.  Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.  Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.
  • 11. THERE ARE THREE MAIN ADVANTAGES BY CONVERTING THE DECISION TREE TO RULES BEFORE PRUNING  Converting to rules allows distinguishing among the different contexts in which a decision node is used.  Because each distinct path through the decision tree node produces a distinct rule, the pruning decision regarding that attribute test can be made differently for each path.  Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves.  Thus, it avoid messy bookkeeping issues such as how to reorganize the tree if the root node is pruned while retaining part of the subtree below this test.  Converting to rules improves readability. Rules are often easier for to understand.
  • 12. 2. INCORPORATING CONTINUOUS-VALUED ATTRIBUTES  Our initial definition of ID3 is restricted to attributes that take on a discrete set of values.  1. The target attribute whose value is predicted by learned tree must be discrete valued.  2. The attributes tested in the decision nodes of the tree must also be discrete valued.  This second restriction can easily be removed so that continuous-valued decision attributes can be incorporated into the learned tree.
  • 13. 3. ALTERNATIVE MEASURES FOR SELECTING ATTRIBUTES  There is a natural bias in the information gain measure that favours attributes with many values over those with few values.  As an extreme example, consider the attribute Date, which has a very large number of possible values. What is wrong with the attribute Date?  Simply put, it has so many possible values that it is bound to separate the training examples into very small subsets.  Because of this, it will have a very high information gain relative to the training examples.  How ever, having very high information gain, its a very poor predictor of the target function over unseen instances.
  • 14. 4. HANDLING MISSING ATTRIBUTE VALUES  In certain cases, the available data may be missing values for some attributes.  For example, in a medical domain in which we wish to predict patient outcome based on various laboratory tests, it may be that the Blood-Test- Result is available only for a subset of the patients.  In such cases, it is common to estimate the missing attribute value based on other examples for which this attribute has a known value.
  • 15. 5. HANDLING ATTRIBUTES WITH DIFFERING COSTS  In some learning tasks the instance attributes may have associated costs.  For example, in learning to classify medical diseases we might describe patients in terms of attributes such as Temperature, BiopsyResult, Pulse, BloodTestResults, etc.  These attributes vary significantly in their costs, both in terms of monetary cost and cost to patient comfort.