SlideShare uma empresa Scribd logo
1 de 9
Decision Stream:
Cultivating Deep Decision Trees
D. Yu. Ignatov
Huawei Technologies
A. D. Ignatov
ETH Zurich
IEEE ICTAI 2017
Сode: github.com/aiff22/decision-stream
Decision tree
Depth4
Features
Binary tree
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Depth is limited by the quantities
of nodes and training samples
Depth # nodes # samples
5 31 32
10 1023 1024
100 1030 1030
1000 10301 10301
l a b e l
f1
f2
f3
f4
Using more features can yield
better prediction accuracy, but
the depth of decision tree is
limited!
Advantages:
- Complexity reduction
- Prevention of overfitting
- High precision
Tree
Stream
that are similar
according to two-
sample test statistics
Leaves are replaced
with new nodes
Leaves
New nodes
Leaves
New nodes
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Complexity Reduction by Merging Similar Nodes
Merge leaves
Node Splitting for Large-Scale Datasets
1. First, split N samples into n
groups of the same size
2. Merge groups with similar
label distribution (according to
two-sample test statistics)
)( Nn 
} n groups
3. Replace previously merged
groups with new nodes
N samples
Merged groups
Leaves
Leaves
{
{ { { { }}}}
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Advantage: Speed-up of learning on
distributed system
Y
N
Y
N
Decision Stream Learning Algorithm
Select feature for splitting
Apply Merging
rule to groups
Quantity of groups > 1
Merge group with leaves by Merging rule
Add group
to leaves
Merged
End
Merging rule
(group clustering)
Beginning from groups
with the nearest mean
values of label, merge
groups with the labels
similar according to
unpaired two-sample
test statistics
Categorical Continuous
Apply Merging rule to
group pairs nearest by
mean value of the feature
Feature type
Samples
Foreverygroup
Formergedgroup
Add splitting
rule to node
New node
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Split into groups of
the same size
Decision Stream:
Merge of Nodes during the Training Process
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Up to 30 – 55 %
leaves are merged
at one step
Depth reaches
400 levels
Precision: Decision Stream vs. Decision Tree
Reduction of error:
average 16 %
maximal 37 %
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Benefits of Decision Stream
• High accuracy due to
precise splitting of
statistically representative
data with unpaired two-
sample test statistics
• Decreasing of overfitting
due to partition of data
into statistically different
groups
• Complexity reduction
• No manual regulation of
depth
• Speed-up of learning on big
distributed data
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Benefits of Decision Stream in
Classification and Regression
Binary classification with 31 nodes
Decision Tree
5 levels
Decision Stream
10 levels
Decision Stream:
Cultivating Deep Decision Trees
D. Yu. Ignatov
Huawei Technologies
A. D. Ignatov
ETH Zurich
IEEE ICTAI 2017
Сode: github.com/aiff22/decision-stream

Mais conteúdo relacionado

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Último (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Destaque (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Decision Stream: Cultivating Deep Decision Trees

  • 1. Decision Stream: Cultivating Deep Decision Trees D. Yu. Ignatov Huawei Technologies A. D. Ignatov ETH Zurich IEEE ICTAI 2017 Сode: github.com/aiff22/decision-stream
  • 2. Decision tree Depth4 Features Binary tree ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Depth is limited by the quantities of nodes and training samples Depth # nodes # samples 5 31 32 10 1023 1024 100 1030 1030 1000 10301 10301 l a b e l f1 f2 f3 f4 Using more features can yield better prediction accuracy, but the depth of decision tree is limited!
  • 3. Advantages: - Complexity reduction - Prevention of overfitting - High precision Tree Stream that are similar according to two- sample test statistics Leaves are replaced with new nodes Leaves New nodes Leaves New nodes ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Complexity Reduction by Merging Similar Nodes Merge leaves
  • 4. Node Splitting for Large-Scale Datasets 1. First, split N samples into n groups of the same size 2. Merge groups with similar label distribution (according to two-sample test statistics) )( Nn  } n groups 3. Replace previously merged groups with new nodes N samples Merged groups Leaves Leaves { { { { { }}}} ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Advantage: Speed-up of learning on distributed system
  • 5. Y N Y N Decision Stream Learning Algorithm Select feature for splitting Apply Merging rule to groups Quantity of groups > 1 Merge group with leaves by Merging rule Add group to leaves Merged End Merging rule (group clustering) Beginning from groups with the nearest mean values of label, merge groups with the labels similar according to unpaired two-sample test statistics Categorical Continuous Apply Merging rule to group pairs nearest by mean value of the feature Feature type Samples Foreverygroup Formergedgroup Add splitting rule to node New node ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Split into groups of the same size
  • 6. Decision Stream: Merge of Nodes during the Training Process ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Up to 30 – 55 % leaves are merged at one step Depth reaches 400 levels
  • 7. Precision: Decision Stream vs. Decision Tree Reduction of error: average 16 % maximal 37 % ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
  • 8. Benefits of Decision Stream • High accuracy due to precise splitting of statistically representative data with unpaired two- sample test statistics • Decreasing of overfitting due to partition of data into statistically different groups • Complexity reduction • No manual regulation of depth • Speed-up of learning on big distributed data ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Benefits of Decision Stream in Classification and Regression Binary classification with 31 nodes Decision Tree 5 levels Decision Stream 10 levels
  • 9. Decision Stream: Cultivating Deep Decision Trees D. Yu. Ignatov Huawei Technologies A. D. Ignatov ETH Zurich IEEE ICTAI 2017 Сode: github.com/aiff22/decision-stream

Notas do Editor

  1. We are presenting a new supervised learning method for classification and regression - Decision Stream.
  2. Decision Stream is designed on the basis of Decision Tree – a widely used Machine Learning technology. Classical Decision Tree takes object’s features as an input, and based on the learned thresholds makes prediction of the object’s label. A well-known problem of Decision Trees is a limited accuracy related to their rather simple structure. The tree depth determines the number of features used for prediction. For instance, if the tree consists of 4 levels then only 4 features are utilized to detect the objects of one class. However, usually each object is characterized by plenty of features, and we can precisely classify it only by taking into account all of them, identifying their certain combinations.   The Decision Tree depth is limited by the number of nodes and samples. The quantity of nodes geometrically increases with a rise of the tree depth, and for example the binary tree with a depth of 100 levels consists of approximately 1030 nodes. In this case, to get at least one sample in every terminal leaf node we need a dataset with 1030 samples. Thus, it is obvious that the training of so deep trees is not realistic.
  3. To solve the described problem, we propose a novel method for Decision Tree complexity reduction, where key step is merging similar leaves on each Decision Tree level. The similarity of nodes is estimated with two-sample test statistics, such as Z-test, Student’s t-test, Kolmogorov-Smirnov or Mann-Whitney U tests. Fusion of leaf nodes leads to reduction of model width and can enforce generation of very deep predictive models.   Proposed model – Decision Stream – gives following advantages:   Model’s complexity reduction that provides an opportunity to increase the depth of predictive model. Prevention of overfitting due to statistically representative quantity of samples in nodes during the training process. High precision due to the usage of all efficient combinations of features for prediction in deep decision graph.
  4. For large-scale datasets the speed of training becomes quite critical. To speed-up the learning procedure in this case, we propose a modification of the splitting rule used in Decision Tree. First, in each node the samples are sorted according to the value of the selected feature and split into n groups of the same size. Then, groups with the similar label distributions are fused into new nodes. Group similarity is estimated with two-sample test statistics that are the same as in the previous case. This multiple splitting gives a significant speed-up in training on big data distributed on computer cluster. In our experiments the duration of learning is decreased by several times.
  5. Let’s now look at the overview of Decision Stream learning algorithm. The classical decision tree training is extended with feature-based partitioning of the data into multiple groups of the same size within every leaf node, and fusion of the similar groups and similar leaves after every splitting cycle. The fusion in all cases is performed according to merging rule, which can be considered as the statistically-based label clustering: beginning from groups with the nearest mean values of label, we merge groups with the labels that are similar according to unpaired two-sample test statistics. The learning continues as long as samples in the leaves can be split into new statistically different groups.
  6. Splitting of leaves proceeds their merging, and on this figure you can see the number of leaves after splitting (blue) and merging (red) at each iteration of the training process for five common machine learning problems. The percentage of nodes that are fused according to merging rule is progressively increased up to 30 – 55 % with a growth of the number of leaves, while at the end of learning is decreased due to the formation of statistically distinguishable groups of samples. The merging of leaves and reduction of model width enforce the generation of extremely deep graphs, which, for example in case of MNIST classification, reaches the depth of 400 levels. As shown previously, Decision Tree can’t obtain such depth due to geometrical growth of leaf quantity.
  7. The switch-over to deep Decision Stream models from Decision Tree decreases the prediction error in average by 16 % with a maximal advantage of 37 %. On all datasets the single Decision Stream significantly outperforms Decision Tree. The best improvement is obtained on Ailerons dataset, where Decision Stream demonstrates 35 % lower error than Decision Tree. Among ensembles, the Extremely Randomized Trees technique demonstrates the best result for Decision Stream, while Random Forest – for Decision Tree. Thus, avoiding the problem of data exhaustion, preserving statistically representative data samples in the nodes, and moving through all fruitful splitting of features the Decision Stream significantly increases the accuracy of prediction in the tasks of classification and regression.
  8. The key advantage of Decision Stream is an efficient usage of every node. With the same quantity of nodes, it provides higher depth than Decision Tree, splitting and merging the data multiple times with different features. The predictive model is growing till no improvements are achievable, considering different data recombinations, and resulting in deep directed acyclic graph architecture and statistically-significant data partition. This technique can provide extremely deep predictive models with hundreds of levels, where decision branches are loosely split and merged like natural streams of a waterfall. The main benefits of Decision Stream are: - High accuracy due to the precise splitting of statistically representative data with unpaired two-sample test statistics. - Decreasing of overfitting due to partition of data into statistically different groups. - Reduction of complexity on every level of predictive model. - No manual regulation of depth. - Speed-up of learning on big distributed data.
  9. The code of the presented model can be downloaded from our GitHub repository. Thanks for your attention.