This document summarizes a presentation on maximizing profit from customer churn prediction models using cost-sensitive machine learning techniques. It discusses how traditional evaluation measures like accuracy do not account for different costs of prediction errors. It then covers cost-sensitive approaches like cost-proportionate sampling, Bayes minimum risk, and cost-sensitive decision trees. The results show these cost-sensitive methods improve savings over traditional models and sampling approaches when the business costs are incorporated into the predictive modeling.
3. Churn Modeling
• Detect which customers are likely to abandon
Voluntary churn
Involuntary churn
4. Customer Churn Management Campaign
Inflow
New
Customers
Customer
Base
Active
Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn
prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
1
1
1 − 𝛾𝛾
1
6. Evaluation of a Campaign
However these measures assign the same weight to different errors
Not the case in a Churn model since
Failing to predict a churner carries a different cost than wrongly
predicting a non-churner
Churners have different financial impact
7. Financial Evaluation of a Campaign
Inflow
New
Customers
Customer
Base
Active
Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn
prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
0
𝐶𝐿𝑉
𝐶𝐿𝑉 + 𝐶 𝑎𝐶 𝑜 + 𝐶 𝑎
𝐶 𝑜 + 𝐶 𝑎
8. Financial Evaluation of a Campaign
Cost Matrix
where:
True Class (𝑦𝑖)
Churner (𝑦𝑖=1)
Non-
Churner(𝑦𝑖=0)
Predicte
d class
(𝑐𝑖)
Churner (𝑐𝑖=1)
Non-Churner
(𝑐𝑖=0)
𝐶 𝑎 = Administrative cost 𝐶𝐿𝑉𝑖 = Client Lifetime Value of
customer 𝑖
𝐶𝑜 𝑖
= Cost of the offer made to
customer 𝑖
𝛾𝑖 = Probability that customer 𝑖 accepts
the offer
𝐶 𝑇𝑃 𝑖
= 𝛾𝑖 𝐶 𝑜 𝑖
+ 1 − 𝛾𝑖 𝐶𝐿𝑉𝑖 + 𝐶 𝑎
𝐶 𝐹𝑁 𝑖
= 𝐶𝐿𝑉𝑖 𝐶 𝑇𝑁 𝑖
= 0
𝐶 𝐹𝑃 𝑖
= 𝐶 𝑜 𝑖
+ 𝐶 𝑎
9. Financial Evaluation of a Campaign
Using the cost matrix the total cost is calculated as:
𝐶 = 𝑦𝑖 𝑐𝑖 ∙ 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 ∙ 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖
Additionally the savings are defined as:
𝐶𝑠 =
𝐶0 − 𝐶
𝐶0
where 𝐶0 is the cost when all the customers are predicted as non-churners
10. Financial Evaluation of a Campaign
Customer Lifetime Value
*Glady et al. (2009). Modeling churn using customer lifetime value.
11. Offers
Same offer may not apply to all customers (eg. Already
have premium channels)
An offer should be made such that it maximizes the
probability of acceptance (𝛾)
16. Predictive Modeling - Results
0%
2%
4%
6%
8%
10%
12%
14%
Decision
Trees
Logistic
Regression
Random
Forest
F1-Score
Training Under-Sampling SMOTE
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision
Trees
Logistic
Regression
Random
Forest
Savings
Training Under-Sampling SMOTE
17. Predictive Modeling
Sampling techniques helps to improve models’ predictive
power however not necessarily the savings
There is a need for methods that aim to increase savings
19. Cost-Sensitive Predictive Modeling
Traditional methods assume the same cost for different
errors
Not the case in Churn modeling
Some cost-sensitive methods assume a constant cost
difference between errors
Example-Dependent Cost-Sensitive Predictive Modeling
20. Cost-Sensitive Predictive Modeling
Changing class distribution
Cost Proportionate Rejection Sampling
Cost Proportionate Over Sampling
Direct Cost
Bayes Minimum Risk
Modifying a learning algorithm
CS – Decision Tree
25. Bayes Minimum Risk
Decision model based on quantifying tradeoffs between various decisions
using probabilities and the costs that accompany such decisions
Risk of classification
𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶 𝑇𝑁 𝑖
1 − 𝑝𝑖 + 𝐶 𝐹𝑁 𝑖
∙ 𝑝𝑖
𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶 𝐹𝑃 𝑖
1 − 𝑝𝑖 + 𝐶 𝑇𝑃 𝑖
∙ 𝑝𝑖
Using the different risks the prediction is made based on the following
condition:
𝑐𝑖 =
0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
28. Bayes Minimum Risk
Bayes Minimum Risk increases the savings by using a cost-
insensitive method and then introducing the costs
Why not introduce the costs during the estimation of the
methods?
29. CS – Decision Trees
Decision trees
Classification model that iteratively creates binary
decision rules 𝑥 𝑗, 𝑙 𝑗
𝑚 that maximize certain criteria
Where 𝑥 𝑗, 𝑙 𝑗
𝑚 refers to making a rule using feature 𝑗 on
value 𝑚
30. Comparison of Models
0%
10%
20%
30%
40%
50%
Random Forest
Train
Logistic
Regression
CSRejection
Logistic
Regression BMR
Train
Decision Tree
CostPruning
CSRejection
CS-Decision Tree
Train
Savings F1-Score
31. Conclusions
Selecting models based on traditional statistics does not
gives the best results measured by savings
Incorporating the costs into the modeling helps to achieve
higher savings