Ignatov, D.Yu., Ignatov, A.D. Decision Stream: Cultivating Deep Decision Trees. IEEE ICTAI 2017, pp. 905–912, doi:10.1109/ICTAI.2017.00140
I. Presentation of a new supervised learning method for classification and regression - Decision Stream. II. Decision Stream is designed on the basis of Decision Tree – a widely used Machine Learning technology. Classical Decision Tree takes object’s features as an input, and based on the learned thresholds makes prediction of the object’s label. A well-known problem of Decision Trees is a limited accuracy related to their rather simple structure. The tree depth determines the number of features used for prediction. For instance, if the tree consists of 4 levels then only 4 features are utilized to detect the objects of one class. However, usually each object is characterized by plenty of features, and we can precisely classify it only by taking into account all of them, identifying their certain combinations. The Decision Tree depth is limited by the number of nodes and samples. The quantity of nodes geometrically increases with a rise of the tree depth, and for example the binary tree with a depth of 100 levels consists of approximately 1030 nodes. In this case, to get at least one sample in every terminal leaf node we need a dataset with at least 1030 samples. Thus, it is obvious that the training of so deep trees is not realistic. III. To solve the described problem, we propose a novel method for Decision Tree complexity reduction, where key step is merging similar leaves on each Decision Tree level. The similarity of nodes is estimated with two-sample test statistics, such as Z-test, Student’s t-test, Kolmogorov-Smirnov or Mann-Whitney U tests. Fusion of leaf nodes leads to reduction of model width and can enforce generation of very deep predictive models. Proposed model – Decision Stream – gives following advantages: Model’s complexity reduction that provides an opportunity to increase the depth of predictive model. Prevention of overfitting due to statistically representative quantity of samples in nodes during the training process. High precision due to the usage of all efficient combinations of features for prediction in deep decision graph. IV. To speed-up the learning procedure in this case, we propose a modification of the splitting rule used in Decision Tree. First, in each node the samples are sorted according to the value of the selected feature and split into n groups of the same size. Then, groups with the similar label distributions are fused into new nodes. Group similarity is estimated with two-sample test statistics that are the same as in the previous case. VIII. The key advantage of Decision Stream is an efficient usage of every node. With the same quantity of nodes, it provides higher depth than Decision Tree, splitting and merging the data multiple times with different features.
1. Decision Stream:
Cultivating Deep Decision Trees
D. Yu. Ignatov
Huawei Technologies
A. D. Ignatov
ETH Zurich
IEEE ICTAI 2017
Сode: github.com/aiff22/decision-stream
2. Decision tree
Depth4
Features
Binary tree
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Depth is limited by the quantities
of nodes and training samples
Depth # nodes # samples
5 31 32
10 1023 1024
100 1030 1030
1000 10301 10301
l a b e l
f1
f2
f3
f4
Using more features can yield
better prediction accuracy, but
the depth of decision tree is
limited!
3. Advantages:
- Complexity reduction
- Prevention of overfitting
- High precision
Tree
Stream
that are similar
according to two-
sample test statistics
Leaves are replaced
with new nodes
Leaves
New nodes
Leaves
New nodes
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Complexity Reduction by Merging Similar Nodes
Merge leaves
4. Node Splitting for Large-Scale Datasets
1. First, split N samples into n
groups of the same size
2. Merge groups with similar
label distribution (according to
two-sample test statistics)
)( Nn
} n groups
3. Replace previously merged
groups with new nodes
N samples
Merged groups
Leaves
Leaves
{
{ { { { }}}}
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Advantage: Speed-up of learning on
distributed system
5. Y
N
Y
N
Decision Stream Learning Algorithm
Select feature for splitting
Apply Merging
rule to groups
Quantity of groups > 1
Merge group with leaves by Merging rule
Add group
to leaves
Merged
End
Merging rule
(group clustering)
Beginning from groups
with the nearest mean
values of label, merge
groups with the labels
similar according to
unpaired two-sample
test statistics
Categorical Continuous
Apply Merging rule to
group pairs nearest by
mean value of the feature
Feature type
Samples
Foreverygroup
Formergedgroup
Add splitting
rule to node
New node
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Split into groups of
the same size
6. Decision Stream:
Merge of Nodes during the Training Process
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Up to 30 – 55 %
leaves are merged
at one step
Depth reaches
400 levels
7. Precision: Decision Stream vs. Decision Tree
Reduction of error:
average 16 %
maximal 37 %
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
8. Benefits of Decision Stream
• High accuracy due to
precise splitting of
statistically representative
data with unpaired two-
sample test statistics
• Decreasing of overfitting
due to partition of data
into statistically different
groups
• Complexity reduction
• No manual regulation of
depth
• Speed-up of learning on big
distributed data
ICTAI 2017. Ignatov D. Yu., Ignatov A. D. Decision Stream: Cultivating Deep Decision Trees
Benefits of Decision Stream in
Classification and Regression
Binary classification with 31 nodes
Decision Tree
5 levels
Decision Stream
10 levels
9. Decision Stream:
Cultivating Deep Decision Trees
D. Yu. Ignatov
Huawei Technologies
A. D. Ignatov
ETH Zurich
IEEE ICTAI 2017
Сode: github.com/aiff22/decision-stream
Notas do Editor
We are presenting a new supervised learning method for classification and regression - Decision Stream.
Decision Stream is designed on the basis of Decision Tree – a widely used Machine Learning technology. Classical Decision Tree takes object’s features as an input, and based on the learned thresholds makes prediction of the object’s label. A well-known problem of Decision Trees is a limited accuracy related to their rather simple structure. The tree depth determines the number of features used for prediction. For instance, if the tree consists of 4 levels then only 4 features are utilized to detect the objects of one class. However, usually each object is characterized by plenty of features, and we can precisely classify it only by taking into account all of them, identifying their certain combinations.
The Decision Tree depth is limited by the number of nodes and samples. The quantity of nodes geometrically increases with a rise of the tree depth, and for example the binary tree with a depth of 100 levels consists of approximately 1030 nodes. In this case, to get at least one sample in every terminal leaf node we need a dataset with 1030 samples. Thus, it is obvious that the training of so deep trees is not realistic.
To solve the described problem, we propose a novel method for Decision Tree complexity reduction, where key step is merging similar leaves on each Decision Tree level. The similarity of nodes is estimated with two-sample test statistics, such as Z-test, Student’s t-test, Kolmogorov-Smirnov or Mann-Whitney U tests. Fusion of leaf nodes leads to reduction of model width and can enforce generation of very deep predictive models.
Proposed model – Decision Stream – gives following advantages:
Model’s complexity reduction that provides an opportunity to increase the depth of predictive model.
Prevention of overfitting due to statistically representative quantity of samples in nodes during the training process.
High precision due to the usage of all efficient combinations of features for prediction in deep decision graph.
For large-scale datasets the speed of training becomes quite critical. To speed-up the learning procedure in this case, we propose a modification of the splitting rule used in Decision Tree. First, in each node the samples are sorted according to the value of the selected feature and split into n groups of the same size. Then, groups with the similar label distributions are fused into new nodes. Group similarity is estimated with two-sample test statistics that are the same as in the previous case. This multiple splitting gives a significant speed-up in training on big data distributed on computer cluster. In our experiments the duration of learning is decreased by several times.
Let’s now look at the overview of Decision Stream learning algorithm. The classical decision tree training is extended with feature-based partitioning of the data into multiple groups of the same size within every leaf node, and fusion of the similar groups and similar leaves after every splitting cycle. The fusion in all cases is performed according to merging rule, which can be considered as the statistically-based label clustering: beginning from groups with the nearest mean values of label, we merge groups with the labels that are similar according to unpaired two-sample test statistics. The learning continues as long as samples in the leaves can be split into new statistically different groups.
Splitting of leaves proceeds their merging, and on this figure you can see the number of leaves after splitting (blue) and merging (red) at each iteration of the training process for five common machine learning problems. The percentage of nodes that are fused according to merging rule is progressively increased up to 30 – 55 % with a growth of the number of leaves, while at the end of learning is decreased due to the formation of statistically distinguishable groups of samples. The merging of leaves and reduction of model width enforce the generation of extremely deep graphs, which, for example in case of MNIST classification, reaches the depth of 400 levels. As shown previously, Decision Tree can’t obtain such depth due to geometrical growth of leaf quantity.
The switch-over to deep Decision Stream models from Decision Tree decreases the prediction error in average by 16 % with a maximal advantage of 37 %. On all datasets the single Decision Stream significantly outperforms Decision Tree. The best improvement is obtained on Ailerons dataset, where Decision Stream demonstrates 35 % lower error than Decision Tree. Among ensembles, the Extremely Randomized Trees technique demonstrates the best result for Decision Stream, while Random Forest – for Decision Tree. Thus, avoiding the problem of data exhaustion, preserving statistically representative data samples in the nodes, and moving through all fruitful splitting of features the Decision Stream significantly increases the accuracy of prediction in the tasks of classification and regression.
The key advantage of Decision Stream is an efficient usage of every node. With the same quantity of nodes, it provides higher depth than Decision Tree, splitting and merging the data multiple times with different features. The predictive model is growing till no improvements are achievable, considering different data recombinations, and resulting in deep directed acyclic graph architecture and statistically-significant data partition. This technique can provide extremely deep predictive models with hundreds of levels, where decision branches are loosely split and merged like natural streams of a waterfall.
The main benefits of Decision Stream are:
- High accuracy due to the precise splitting of statistically representative data with unpaired two-sample test statistics.
- Decreasing of overfitting due to partition of data into statistically different groups.
- Reduction of complexity on every level of predictive model.
- No manual regulation of depth.
- Speed-up of learning on big distributed data.
The code of the presented model can be downloaded from our GitHub repository.
Thanks for your attention.