This document discusses using random forests to predict customer conversion for a bank marketing campaign. It begins by introducing the research question of which customers to target for offers. Next, it describes the dataset and explores using decision trees which have high variance and may overfit. Then it introduces random forests as an ensemble method that grows many decision trees to reduce variance and avoid overfitting, providing better predictions. It shows random forests significantly outperform a single decision tree on this dataset and don't require much tuning.
1. Predicting Customer Conversion
with Random Forests
A Decision Trees Case Study
Daniel Gerlanc, Principal
Enplus Advisors, Inc.
www.enplusadvisors.com
dgerlanc@enplusadvisors.com
2. Topics
Objectives Research Question
Bank Prospect
Data
Conversion
Decision Trees
Methods
Random Forests
Results
11. Decision Tree Code
tree.1 <- rpart(takes.loan ~ ., data=bank)
• See the „rpart‟ and „rpart.plot‟ R packages.
• Many parameters available to control the fit.
16. Building RF
• Sample from the data
• At each split, sample from the available
variables
• Repeat for each tree
17. Why more than 1?
• Create uncorrelated trees
• Reduce variance of predictor
• Continual cross-validation
18. Random Forests
rffit.1 <- randomForest(takes.loan ~ ., data=bank)
Most important parameters are:
Variable Description Default
ntree Number of Trees 500
mtry Number of variables to randomly square root of # predictors for
select at each node classification, # predictors / 3 for
regression
19. How‟d it do?
Guessing Precision: 11.7%
Random Forest: 64.5%
Actual
Predicted no yes
no (1) 38,526 (3) 1396
yes (2) 2748 (4) 2541
20. Benefits of RF
• Don‟t need a lot of tuning
• Don‟t need an extra cross validation
step
• Many implementations
• R, Weka, RapidMiner, Mahout
21. References
• Breiman, Leo. Classification and Regression Trees. Belmont, Calif:
Wadsworth International Group, 1984. Print.
• Brieman, Leo and Adele Cutler. Random forests.
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht
m
• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank
Direct Marketing: An Application of the CRISP-DM Methodology. In P.
Novais et al. (Eds.), Proceedings of the European Simulation and
Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal,
October, 2011. EUROSIS.
Editor's Notes
Tools that help you decide how to spend those limited resources.