Decision Stream: Cultivating Deep Decision Trees

•Transferir como PPTX, PDF•

5 gostaram•173,249 visualizações

Ignatov, D.Yu., Ignatov, A.D. Decision Stream: Cultivating Deep Decision Trees. IEEE ICTAI 2017, pp. 905–912, doi:10.1109/ICTAI.2017.00140 I. Presentation of a new supervised learning method for classification and regression - Decision Stream. II. Decision Stream is designed on the basis of Decision Tree – a widely used Machine Learning technology. Classical Decision Tree takes object’s features as an input, and based on the learned thresholds makes prediction of the object’s label. A well-known problem of Decision Trees is a limited accuracy related to their rather simple structure. The tree depth determines the number of features used for prediction. For instance, if the tree consists of 4 levels then only 4 features are utilized to detect the objects of one class. However, usually each object is characterized by plenty of features, and we can precisely classify it only by taking into account all of them, identifying their certain combinations. The Decision Tree depth is limited by the number of nodes and samples. The quantity of nodes geometrically increases with a rise of the tree depth, and for example the binary tree with a depth of 100 levels consists of approximately 1030 nodes. In this case, to get at least one sample in every terminal leaf node we need a dataset with at least 1030 samples. Thus, it is obvious that the training of so deep trees is not realistic. III. To solve the described problem, we propose a novel method for Decision Tree complexity reduction, where key step is merging similar leaves on each Decision Tree level. The similarity of nodes is estimated with two-sample test statistics, such as Z-test, Student’s t-test, Kolmogorov-Smirnov or Mann-Whitney U tests. Fusion of leaf nodes leads to reduction of model width and can enforce generation of very deep predictive models. Proposed model – Decision Stream – gives following advantages: Model’s complexity reduction that provides an opportunity to increase the depth of predictive model. Prevention of overfitting due to statistically representative quantity of samples in nodes during the training process. High precision due to the usage of all efficient combinations of features for prediction in deep decision graph. IV. To speed-up the learning procedure in this case, we propose a modification of the splitting rule used in Decision Tree. First, in each node the samples are sorted according to the value of the selected feature and split into n groups of the same size. Then, groups with the similar label distributions are fused into new nodes. Group similarity is estimated with two-sample test statistics that are the same as in the previous case. VIII. The key advantage of Decision Stream is an efficient usage of every node. With the same quantity of nodes, it provides higher depth than Decision Tree, splitting and merging the data multiple times with different features.

Dados e análise

Decision Stream:
Cultivating Deep Decision Trees
D. Yu. Ignatov
Huawei Technologies
A. D. Ignatov
ETH Zurich
IEEE ICTAI 2017
Сode: github.com/aiff22/decision-stream

Decision tree
Depth4
Features
Binary tree
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Depth is limited by the quantities
of nodes and training samples
Depth # nodes # samples
5 31 32
10 1023 1024
100 1030 1030
1000 10301 10301
l a b e l
f1
f2
f3
f4
Using more features can yield
better prediction accuracy, but
the depth of decision tree is
limited!

Advantages:
- Complexity reduction
- Prevention of overfitting
- High precision
Tree
Stream
that are similar
according to two-
sample test statistics
Leaves are replaced
with new nodes
Leaves
New nodes
Leaves
New nodes
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Complexity Reduction by Merging Similar Nodes
Merge leaves

Node Splitting for Large-Scale Datasets
1. First, split N samples into n
groups of the same size
2. Merge groups with similar
label distribution (according to
two-sample test statistics)
)( Nn 
} n groups
3. Replace previously merged
groups with new nodes
N samples
Merged groups
Leaves
Leaves
{
{ { { { }}}}
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Advantage: Speed-up of learning on
distributed system

Y
N
Y
N
Decision Stream Learning Algorithm
Select feature for splitting
Apply Merging
rule to groups
Quantity of groups > 1
Merge group with leaves by Merging rule
Add group
to leaves
Merged
End
Merging rule
(group clustering)
Beginning from groups
with the nearest mean
values of label, merge
groups with the labels
similar according to
unpaired two-sample
test statistics
Categorical Continuous
Apply Merging rule to
group pairs nearest by
mean value of the feature
Feature type
Samples
Foreverygroup
Formergedgroup
Add splitting
rule to node
New node
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Split into groups of
the same size

Decision Stream:
Merge of Nodes during the Training Process
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Up to 30 – 55 %
leaves are merged
at one step
Depth reaches
400 levels

Precision: Decision Stream vs. Decision Tree
Reduction of error:
average 16 %
maximal 37 %
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees

Benefits of Decision Stream
• High accuracy due to
precise splitting of
statistically representative
data with unpaired two-
sample test statistics
• Decreasing of overfitting
due to partition of data
into statistically different
groups
• Complexity reduction
• No manual regulation of
depth
• Speed-up of learning on big
distributed data
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Benefits of Decision Stream in
Classification and Regression
Binary classification with 31 nodes
Decision Tree
5 levels
Decision Stream
10 levels

Mais conteúdo relacionado

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Invezz.com - Grow your wealth with trading signalsInvezz1

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Midocean dropshipping via API with DroFxolyaivanovalion

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Industrialised data - the key to AI success.pdfLars Albertsson

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Ukraine War presentation: KNOW THE BASICSAishani27

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

Destaque (20)

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

Decision Stream: Cultivating Deep Decision Trees

1. Decision Stream: Cultivating Deep Decision Trees D. Yu. Ignatov Huawei Technologies A. D. Ignatov ETH Zurich IEEE ICTAI 2017 Сode: github.com/aiff22/decision-stream

2. Decision tree Depth4 Features Binary tree ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Depth is limited by the quantities of nodes and training samples Depth # nodes # samples 5 31 32 10 1023 1024 100 1030 1030 1000 10301 10301 l a b e l f1 f2 f3 f4 Using more features can yield better prediction accuracy, but the depth of decision tree is limited!

3. Advantages: - Complexity reduction - Prevention of overfitting - High precision Tree Stream that are similar according to two- sample test statistics Leaves are replaced with new nodes Leaves New nodes Leaves New nodes ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Complexity Reduction by Merging Similar Nodes Merge leaves

4. Node Splitting for Large-Scale Datasets 1. First, split N samples into n groups of the same size 2. Merge groups with similar label distribution (according to two-sample test statistics) )( Nn  } n groups 3. Replace previously merged groups with new nodes N samples Merged groups Leaves Leaves { { { { { }}}} ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Advantage: Speed-up of learning on distributed system

5. Y N Y N Decision Stream Learning Algorithm Select feature for splitting Apply Merging rule to groups Quantity of groups > 1 Merge group with leaves by Merging rule Add group to leaves Merged End Merging rule (group clustering) Beginning from groups with the nearest mean values of label, merge groups with the labels similar according to unpaired two-sample test statistics Categorical Continuous Apply Merging rule to group pairs nearest by mean value of the feature Feature type Samples Foreverygroup Formergedgroup Add splitting rule to node New node ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Split into groups of the same size

6. Decision Stream: Merge of Nodes during the Training Process ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Up to 30 – 55 % leaves are merged at one step Depth reaches 400 levels

7. Precision: Decision Stream vs. Decision Tree Reduction of error: average 16 % maximal 37 % ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees

8. Benefits of Decision Stream • High accuracy due to precise splitting of statistically representative data with unpaired two- sample test statistics • Decreasing of overfitting due to partition of data into statistically different groups • Complexity reduction • No manual regulation of depth • Speed-up of learning on big distributed data ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees Benefits of Decision Stream in Classification and Regression Binary classification with 31 nodes Decision Tree 5 levels Decision Stream 10 levels

9. Decision Stream: Cultivating Deep Decision Trees D. Yu. Ignatov Huawei Technologies A. D. Ignatov ETH Zurich IEEE ICTAI 2017 Сode: github.com/aiff22/decision-stream

Notas do Editor

We are presenting a new supervised learning method for classification and regression - Decision Stream.
Decision Stream is designed on the basis of Decision Tree – a widely used Machine Learning technology. Classical Decision Tree takes object’s features as an input, and based on the learned thresholds makes prediction of the object’s label. A well-known problem of Decision Trees is a limited accuracy related to their rather simple structure. The tree depth determines the number of features used for prediction. For instance, if the tree consists of 4 levels then only 4 features are utilized to detect the objects of one class. However, usually each object is characterized by plenty of features, and we can precisely classify it only by taking into account all of them, identifying their certain combinations. The Decision Tree depth is limited by the number of nodes and samples. The quantity of nodes geometrically increases with a rise of the tree depth, and for example the binary tree with a depth of 100 levels consists of approximately 1030 nodes. In this case, to get at least one sample in every terminal leaf node we need a dataset with 1030 samples. Thus, it is obvious that the training of so deep trees is not realistic.
To solve the described problem, we propose a novel method for Decision Tree complexity reduction, where key step is merging similar leaves on each Decision Tree level. The similarity of nodes is estimated with two-sample test statistics, such as Z-test, Student’s t-test, Kolmogorov-Smirnov or Mann-Whitney U tests. Fusion of leaf nodes leads to reduction of model width and can enforce generation of very deep predictive models. Proposed model – Decision Stream – gives following advantages: Model’s complexity reduction that provides an opportunity to increase the depth of predictive model. Prevention of overfitting due to statistically representative quantity of samples in nodes during the training process. High precision due to the usage of all efficient combinations of features for prediction in deep decision graph.
For large-scale datasets the speed of training becomes quite critical. To speed-up the learning procedure in this case, we propose a modification of the splitting rule used in Decision Tree. First, in each node the samples are sorted according to the value of the selected feature and split into n groups of the same size. Then, groups with the similar label distributions are fused into new nodes. Group similarity is estimated with two-sample test statistics that are the same as in the previous case. This multiple splitting gives a significant speed-up in training on big data distributed on computer cluster. In our experiments the duration of learning is decreased by several times.
Let’s now look at the overview of Decision Stream learning algorithm. The classical decision tree training is extended with feature-based partitioning of the data into multiple groups of the same size within every leaf node, and fusion of the similar groups and similar leaves after every splitting cycle. The fusion in all cases is performed according to merging rule, which can be considered as the statistically-based label clustering: beginning from groups with the nearest mean values of label, we merge groups with the labels that are similar according to unpaired two-sample test statistics. The learning continues as long as samples in the leaves can be split into new statistically different groups.
Splitting of leaves proceeds their merging, and on this figure you can see the number of leaves after splitting (blue) and merging (red) at each iteration of the training process for five common machine learning problems. The percentage of nodes that are fused according to merging rule is progressively increased up to 30 – 55 % with a growth of the number of leaves, while at the end of learning is decreased due to the formation of statistically distinguishable groups of samples. The merging of leaves and reduction of model width enforce the generation of extremely deep graphs, which, for example in case of MNIST classification, reaches the depth of 400 levels. As shown previously, Decision Tree can’t obtain such depth due to geometrical growth of leaf quantity.
The switch-over to deep Decision Stream models from Decision Tree decreases the prediction error in average by 16 % with a maximal advantage of 37 %. On all datasets the single Decision Stream significantly outperforms Decision Tree. The best improvement is obtained on Ailerons dataset, where Decision Stream demonstrates 35 % lower error than Decision Tree. Among ensembles, the Extremely Randomized Trees technique demonstrates the best result for Decision Stream, while Random Forest – for Decision Tree. Thus, avoiding the problem of data exhaustion, preserving statistically representative data samples in the nodes, and moving through all fruitful splitting of features the Decision Stream significantly increases the accuracy of prediction in the tasks of classification and regression.
The key advantage of Decision Stream is an efficient usage of every node. With the same quantity of nodes, it provides higher depth than Decision Tree, splitting and merging the data multiple times with different features. The predictive model is growing till no improvements are achievable, considering different data recombinations, and resulting in deep directed acyclic graph architecture and statistically-significant data partition. This technique can provide extremely deep predictive models with hundreds of levels, where decision branches are loosely split and merged like natural streams of a waterfall. The main benefits of Decision Stream are: - High accuracy due to the precise splitting of statistically representative data with unpaired two-sample test statistics. - Decreasing of overfitting due to partition of data into statistically different groups. - Reduction of complexity on every level of predictive model. - No manual regulation of depth. - Speed-up of learning on big distributed data.
The code of the presented model can be downloaded from our GitHub repository. Thanks for your attention.

Decision Stream: Cultivating Deep Decision Trees

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Decision Stream: Cultivating Deep Decision Trees

Notas do Editor