In many healthcare settings, intuitive decision rules for risk stratification can help effective hospital resource allocation. This paper introduces a novel variant of decision tree algorithms that produces a chain of decisions, not a general tree. Our algorithm, $\alpha$-Carving Decision Chain (ACDC), sequentially carves out ``pure'' subsets of the majority class examples. The resulting chain of decision rules yields a pure subset of the minority class examples. Our approach is particularly effective in exploring large and class-imbalanced health datasets. Moreover, ACDC provides an interactive interpretation in conjunction with visual performance metrics such as Receiver Operating Characteristics curve and Lift chart.
4. Is DC More Interpretable than DT?
• In Decision Chain (DC),
• Risk is proportional to the number of rules
• Less to memorize for filtering out low-risk population (or samples)
• More to memorize for capturing high-risk population
• Using DC, one can implement an economically efficient business process
based on job maturity-level
• While in Decision Tree (DT),
• The number of rules is agnostic to risk
• Low-risk can be captured with one rule as well as hundreds of rules
• Thus, DC may be helpful for some applications
4ICML WHI 2016
6. Question is How
• We will use a greedy approach
• Note that decision tree is also a greedy algorithm
• Pick a splitting feature that maximizes {information gain, purity score, etc.}
• Split the dataset into parts based on the value of the splitting feature
• Repeat from the beginning for each dataset
• We will grow a decision chain as follows
• Pick a splitting feature that carves out the most amount of majority samples
• Split the dataset into parts based on the value of the splitting feature
• Repeat from the beginning on only one partition that has more positive class
examples
6ICML WHI 2016
7. More Details on How
• Selecting the best splitting feature
• We will use Alpha-Divergence
• Alpha-Divergence is the same as KL-Divergence when Alpha=1
• Alpha-Divergence is the same as Hellingerdistance when Alpha=0.5
• Alpha-Divergence can be a lot of different things based on the value of Alpha
• We will change the value of Alpha adaptively (with a simple strategy)
to achieve our goal
• More details are in the paper
7ICML WHI 2016