SlideShare uma empresa Scribd logo
1 de 10
Murphy Choy, University College Dublin
Building a decision tree
from decision stumps
Contents
•Introduction to decision trees
•What is a decision tree stump?
•CART VS CHAID
•Criterion for splitting
•Building a decision tree stump macro
•Linking the tree up
•Conclusion
Introduction to Decision tree
Decision tree Stump
CART VS CHAID
•Easier to understand splits
oBinary splits are easier to understand
oCan be phrased as an either or statement
•Able to handle different data types
oCART is able to handle nominal, categorical and
missing values simultaneously unlike CHAID.
CART VS CHAID
•More robust statistics
oCHAID uses chi square test which is size dependent
and suffers from multiple comparison test deficiency.
oBenferroni adjustment does not fully compensate for the
deficiency.
•Less dispersion effects
oMultiple splits in a single node results in smaller
subsequent nodes that may cause severe skewness in
validation.
Splitting criterion
•Gini impurity is the measure of how frequently a
randomly chosen element from a set is
incorrectly labeled if it were labeled randomly
according to the distribution of labels in the
subset.
Building the Decision tree stump SAS
Macro
Gini
Gini
Gini
Selection
Building the linkage for a tree
Conclusion
•Useful for a variety of purposes
•Build a full decision tree

Mais conteúdo relacionado

Semelhante a Building a decision tree from decision stumps

Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clusteringbiagiolicari7
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computertttiba
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
H2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandryH2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandrySri Ambati
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
A functional software measurement approach bridging the gap between problem a...
A functional software measurement approach bridging the gap between problem a...A functional software measurement approach bridging the gap between problem a...
A functional software measurement approach bridging the gap between problem a...IWSM Mensura
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfssuser4c50a9
 
Tolerance stack up and analysis mn
Tolerance stack up and analysis   mnTolerance stack up and analysis   mn
Tolerance stack up and analysis mnMOHAN Narasaiah
 
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Christopher Sneed, MSDS, PMP, CSPO
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Itembarthriley
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
Credit Card Fraudulent Transaction Detection.pptx
Credit Card Fraudulent Transaction Detection.pptxCredit Card Fraudulent Transaction Detection.pptx
Credit Card Fraudulent Transaction Detection.pptxssuser67d31a1
 

Semelhante a Building a decision tree from decision stumps (20)

Geoff
GeoffGeoff
Geoff
 
Classification.pptx
Classification.pptxClassification.pptx
Classification.pptx
 
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Decision tree
Decision treeDecision tree
Decision tree
 
Galambos_SlidesNEAIR2015
Galambos_SlidesNEAIR2015Galambos_SlidesNEAIR2015
Galambos_SlidesNEAIR2015
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Decision trees
Decision treesDecision trees
Decision trees
 
H2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandryH2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark Landry
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
A functional software measurement approach bridging the gap between problem a...
A functional software measurement approach bridging the gap between problem a...A functional software measurement approach bridging the gap between problem a...
A functional software measurement approach bridging the gap between problem a...
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Tolerance stack up and analysis mn
Tolerance stack up and analysis   mnTolerance stack up and analysis   mn
Tolerance stack up and analysis mn
 
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Credit Card Fraudulent Transaction Detection.pptx
Credit Card Fraudulent Transaction Detection.pptxCredit Card Fraudulent Transaction Detection.pptx
Credit Card Fraudulent Transaction Detection.pptx
 

Mais de Murphy Choy

Right time Vs real time
Right time Vs real timeRight time Vs real time
Right time Vs real timeMurphy Choy
 
Applications of the DOW loop
Applications of the DOW loop Applications of the DOW loop
Applications of the DOW loop Murphy Choy
 
Data masking in sas
Data masking in sasData masking in sas
Data masking in sasMurphy Choy
 
Data masking with classical ciphers
Data masking with classical ciphersData masking with classical ciphers
Data masking with classical ciphersMurphy Choy
 
A simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasA simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasMurphy Choy
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesMurphy Choy
 
Edison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyEdison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyMurphy Choy
 
Real Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachReal Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachMurphy Choy
 

Mais de Murphy Choy (8)

Right time Vs real time
Right time Vs real timeRight time Vs real time
Right time Vs real time
 
Applications of the DOW loop
Applications of the DOW loop Applications of the DOW loop
Applications of the DOW loop
 
Data masking in sas
Data masking in sasData masking in sas
Data masking in sas
 
Data masking with classical ciphers
Data masking with classical ciphersData masking with classical ciphers
Data masking with classical ciphers
 
A simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasA simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sas
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance Companies
 
Edison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyEdison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter study
 
Real Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachReal Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature Approach
 

Último

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

Building a decision tree from decision stumps

  • 1. Murphy Choy, University College Dublin Building a decision tree from decision stumps
  • 2. Contents •Introduction to decision trees •What is a decision tree stump? •CART VS CHAID •Criterion for splitting •Building a decision tree stump macro •Linking the tree up •Conclusion
  • 5. CART VS CHAID •Easier to understand splits oBinary splits are easier to understand oCan be phrased as an either or statement •Able to handle different data types oCART is able to handle nominal, categorical and missing values simultaneously unlike CHAID.
  • 6. CART VS CHAID •More robust statistics oCHAID uses chi square test which is size dependent and suffers from multiple comparison test deficiency. oBenferroni adjustment does not fully compensate for the deficiency. •Less dispersion effects oMultiple splits in a single node results in smaller subsequent nodes that may cause severe skewness in validation.
  • 7. Splitting criterion •Gini impurity is the measure of how frequently a randomly chosen element from a set is incorrectly labeled if it were labeled randomly according to the distribution of labels in the subset.
  • 8. Building the Decision tree stump SAS Macro Gini Gini Gini Selection
  • 9. Building the linkage for a tree
  • 10. Conclusion •Useful for a variety of purposes •Build a full decision tree

Notas do Editor

  1. Single layer decision tree Often used in large sample segmentation Also used to do simple prediction in small sample Easy to manage in terms of coding
  2. Pre-summarization Calculate the gini impurity Selecting the split
  3. Iterative calling of the decision stumps to build a tree