Mais conteúdo relacionado Semelhante a Maximizing a churn campaign’s profitability with cost sensitive predictive analytics (20) Mais de Alejandro Correa Bahnsen, PhD (6) Maximizing a churn campaign’s profitability with cost sensitive predictive analytics1. Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Maximizing a Churn Campaign’s
Profitability With Cost-Sensitive
Predictive Analytics
Alejandro Correa Bahnsen, Luxembourg University
Andres Felipe Gonzalez Montoya, DIRECTV
2. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
3. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Churn Modeling
• Detect which customers are likely to abandon
Voluntary churn
Involuntary churn
4. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Customer Churn Management Campaign
Inflow
New
Customers
Customer
Base
Active
Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
1
1
1 − 𝛾𝛾
1
5. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• Confusion Matrix
• Accuracy =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =
𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
• F1-Score = 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted
class (𝑐𝑖)
Churner (𝑐𝑖=1) TP FP
Non-Churner (𝑐𝑖=0) FN TN
6. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• However these measures assign the same weight to different
errors
• Not the case in a Churn model since
Failing to predict a churner carries a different cost than wrongly
predicting a non-churner
Churners have different financial impact
7. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
Inflow
New
Customers
Customer
Base
Active
Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective
Churners
Churn Model Prediction
0
𝐶𝐿𝑉
𝐶𝐿𝑉 + 𝐶 𝑎𝐶 𝑜 + 𝐶 𝑎
𝐶 𝑜 + 𝐶 𝑎
8. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
• Cost Matrix
where:
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted
class (𝑐𝑖)
Churner (𝑐𝑖=1)
Non-Churner (𝑐𝑖=0)
𝐶 𝑎 = Administrative cost 𝐶𝐿𝑉𝑖 = Client Lifetime Value of
customer 𝑖
𝐶 𝑜 𝑖
= Cost of the offer made to
customer 𝑖
𝛾𝑖 = Probability that customer 𝑖
accepts the offer
𝐶 𝑇𝑃 𝑖
= 𝛾𝑖 𝐶 𝑜 𝑖
+ 1 − 𝛾𝑖 𝐶𝐿𝑉𝑖 + 𝐶 𝑎
𝐶 𝐹𝑁 𝑖
= 𝐶𝐿𝑉𝑖 𝐶 𝑇𝑁 𝑖
= 0
𝐶 𝐹𝑃 𝑖
= 𝐶 𝑜 𝑖
+ 𝐶 𝑎
9. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
• Using the cost matrix the total cost is calculated as:
𝐶 = 𝑦𝑖 𝑐𝑖 ∙ 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 ∙ 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖
• Additionally the savings are defined as:
𝐶𝑠 =
𝐶0 − 𝐶
𝐶0
where 𝐶0 is the cost when all the customers are predicted as non-churners
10. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Customer Lifetime Value
Financial Evaluation of a Campaign
*Glady et al. (2009). Modeling churn using customer lifetime value.
11. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
12. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers
• Same offer may not apply to all customers (eg. Already have
premium channels)
• An offer should be made such that it maximizes the
probability of acceptance (𝛾) and CLV
13. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers clusters
14. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
Improve
to HD DVR
Monthly
Discount
Premium
Channels
Evaluate
Offers
Performance
15. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
88%
90%
92%
94%
96%
98%
100%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Churn Rate Gamma (right axis)
𝛾 = Probability that a customer accepts the offer
16. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Using predictive analytics for detecting the behavioral
patterns of those customer's who had defect in the past
17. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Then check which of the current customers share the same
patterns
18. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Dataset
Dataset N Churn 𝑪 𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
19. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Algorithms
Decision Trees
Logistic Regression
Random Forest
20. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - Results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Decision
Trees
Logistic
Regression
Random
Forest
F1-Score
Training Under-Sampling
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees Logistic
Regression
Random
Forest
Savings
Training Under-Sampling
21. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Synthetic Minority Over-sampling Technique
Dim2
Dim 1 Synthetic samples
22. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Dataset
Dataset N Churn 𝑪 𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
23. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Decision
Trees
Logistic
Regression
Random
Forest
F1-Score
Training Under-Sampling SMOTE
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees Logistic
Regression
Random
Forest
Savings
Training Under-Sampling SMOTE
24. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Sampling techniques helps to improve models’ predictive
power however not necessarily the savings
• There is a need for methods that aim to increase savings
25. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
26. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Traditional methods assume the same cost for different errors
• Not the case in Churn modeling
• Some cost-sensitive methods assume a constant cost difference between
errors
• Example-Dependent Cost-Sensitive Predictive Modeling
27. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Changing class distribution
Cost Proportionate Rejection Sampling
Cost Proportionate Over Sampling
• Direct Cost
Bayes Minimum Risk
• Modifying a learning algorithm
CS – Decision Tree
28. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Normalized Cost weight
𝑤𝑖 =
𝐶 𝐹𝑃 𝑖 𝑖𝑓 𝑦𝑖 = 0
𝐶 𝐹𝑁 𝑖 𝑖𝑓 𝑦𝑖 = 1
𝑤𝑖 =
𝑤𝑖
max
𝑗
𝑤𝑗
29. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Over Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1)
(2,1,10)
(3,0,2)
(4,1,20)
(5,0,1)
Cost Proportionate Dataset
(1,0,1)
(2,1,1), (2,1,1), …, (2,1,1)
(3,0,2), (3,0,2)
(4,1,1), (4,1,1), (4,1,1), …, (4,1,1), (4,1,1)
(5,0,1)
*Elkan, C. (2001). The Foundations of Cost-Sensitive Learning.
30. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Rejection Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1)
(2,1,10)
(3,0,2)
(4,1,20)
(5,0,1)
Cost
Proportionate
Dataset
(2,1,1)
(4,1,1)
(4,1,1)
(5,0,1)
*Zadrozny et al. (2003). Cost-sensitive learning by cost-proportionate example weighting.
𝑤𝑖
0.05
0.5
0.1
1
0.05
31. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Dataset
Dataset N Churn 𝑪 𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
CS – Rejection-Sampling 428 41.35% 231,428
CS – Over-Sampling 5767 31.24% 2,350,285
32. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
0%
5%
10%
15%
20%
25%
Decision Trees Logistic
Regression
Random
Forest
Savings
Training Under SMOTE
CS-Rejection CS-Over
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Decision
Trees
Logistic
Regression
Random
Forest
F1-Score
Training Under SMOTE
CS-Rejection CS-Over
33. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision model based on quantifying tradeoffs between
various decisions using probabilities and the costs that
accompany such decisions
• Risk of classification
𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶 𝑇𝑁 𝑖 1 − 𝑝𝑖 + 𝐶 𝐹𝑁 𝑖 ∙ 𝑝𝑖
𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶 𝐹𝑃 𝑖 1 − 𝑝𝑖 + 𝐶 𝑇𝑃 𝑖 ∙ 𝑝𝑖
Bayes Minimum Risk
34. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Using the different risks the prediction is made based on the
following condition:
𝑐𝑖 =
0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• Example-dependent threshold
𝑡 𝐵𝑀𝑅 𝑖 =
𝐶 𝐹𝑃 𝑖 − 𝐶 𝑇𝑁 𝑖
𝐶 𝐹𝑁 𝑖 − 𝐶 𝑇𝑁 𝑖 − 𝐶 𝑇𝑃 𝑖 + 𝐶 𝐹𝑃 𝑖
Bayes Minimum Risk
35. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0%
5%
10%
15%
20%
25%
30%
35%
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
36. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
37. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
• Bayes Minimum Risk increases the savings by using a cost-
insensitive method and then introducing the costs
• Why not introduce the costs during the estimation of the
methods?
38. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Decision trees
Classification model that iteratively creates binary decision rules
𝑥 𝑗
, 𝑙 𝑗
𝑚 that maximize certain criteria
Where 𝑥 𝑗
, 𝑙 𝑗
𝑚 refers to making a rule using feature 𝑗 on value 𝑚
39. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Then the impurity of each leaf is calculated using:
Misclassification: 𝐼 𝑚 𝜋1 = 1 − 𝑚𝑎𝑥 𝜋1, (1 − 𝜋1)
Entropy : 𝐼𝑒 𝜋1 = −𝜋1 log 𝜋1 − 1 − 𝜋1 log(1 − 𝜋1)
Gini : 𝐼𝑔 𝜋1 = 2𝜋1 1 − 𝜋1
𝜋1is the percentage of positives.
CS – Decision Trees
𝑆
𝑆 𝑙 𝑆 𝑟
𝑆 𝑙
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 ≤ 𝑙 𝑗
𝑚 𝑆 𝑟
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 > 𝑙 𝑗
𝑚
𝑥 𝑗
, 𝑙 𝑗
𝑚
40. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Afterwards the gain of applying a given rule to the set 𝑆 is:
𝐺𝑎𝑖𝑛 𝑥 𝑗, 𝑙 𝑗
𝑚 = 𝐼 𝜋1 −
𝑆 𝑙
𝑆
𝐼(𝜋 𝑙
1) −
𝑆 𝑟
𝑆
𝐼(𝜋 𝑟
1)
CS – Decision Trees
𝑆
𝑆 𝑙 𝑆 𝑟
𝑆 𝑙
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 ≤ 𝑙 𝑗
𝑚 𝑆 𝑟
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 > 𝑙 𝑗
𝑚
𝑥 𝑗
, 𝑙 𝑗
𝑚
41. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• The rule that maximizes the gain is selected
𝑏𝑒𝑠𝑡 𝑥, 𝑏𝑒𝑠𝑡𝑙 = argmax
(𝑗,𝑚)
𝐺𝑎𝑖𝑛 𝑥 𝑗, 𝑙 𝑗
𝑚
• The process is repeated until a stopping criteria is met:
CS – Decision Trees
S
S S
S S S S
S S S S
42. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Decision trees - Pruning
• Calculation of the Tree error and pruned Tree error
• After calculating the pruning criteria for all possible trees. The maximum
improvement is selected and the Tree is pruned.
• Later the process is repeated until there is no further improvement.
S
S S
S S S S
S S S S
S
S S
S S S S
S S
S
S S
S S
𝜖 𝑇𝑟𝑒𝑒
𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
43. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Maximize the accuracy is different than maximizing the cost
• To solve this, some studies had been proposed method that
aim to introduce the cost-sensitivity into the algorithms
• However, research have been focused on class-dependent
methods Instead we used a:
Example-dependent cost based impurity measure
Example-dependent cost based pruning criteria
44. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Cost based impurity measure
• The impurity of each leaf is calculated using:
𝐼𝑐 𝑆 = 𝑚𝑖𝑛 𝐶0, 𝐶1
𝑓(𝑆) =
0 𝐶0 ≤ 𝐶1
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑆
𝑆 𝑙 𝑆 𝑟
𝑆 𝑙
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 ≤ 𝑙 𝑗
𝑚 𝑆 𝑟
= 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥 𝑗
𝑖 > 𝑙 𝑗
𝑚
𝑥 𝑗
, 𝑙 𝑗
𝑚
45. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Cost sensitive pruning
𝑃𝐶𝑐 =
𝐶 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝐶 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
• New pruning criteria that evaluates the improvement in cost
of eliminating a particular branch
46. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0%
10%
20%
30%
40%
50%
Error Pruning Cost Pruning
Decision Trees Cost-Sensitive Decision Trees
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
47. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0
0.05
0.1
0.15
0.2
0.25
0.3
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
48. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Comparison of Models
0%
10%
20%
30%
40%
50%
Random Forest
Train
Logistic Regression
CSRejection
Logistic Regression
BMR Train
Decision Tree
CostPruning
CSRejection
CS-Decision Tree
Train
Savings F1-Score
49. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Conclusions
• Selecting models based on traditional statistics does not gives
the best results measured by savings
• Incorporating the costs into the modeling helps to achieve
higher savings
50. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Other Applications
• Fraud Detection
Correa Bahnsen et al. (2013). Cost Sensitive Credit Card Fraud Detection
using Bayes Minimum Risk.
Correa Bahnsen, et al. (2014). Improving Credit Card Fraud Detection with
Calibrated Probabilities.
• Credit Scoring
Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Credit
Scoring using Bayes Minimum Risk.
• Direct Marketing
Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Decision
Trees.
51. Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Contact Information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
al.bahnsen@gmail.com
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
Andres Gonzalez Montoya
DIRECTV
Colombia
andrezfg@gmail.com
52. Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Thank you!
Alejandro Correa Bahnsen, Luxembourg University
Andres Felipe Gonzalez Montoya, DIRECTV