Bank market classification

Classification of Bank Marketing Dataset
using Decision Tree Induction

Sunil Kumar P (A13020)
Maruthi Nataraj K (A13009)
Praxis Business School , Kolkata
31-Oct-2013

Agenda
 Introduction
 Objective
 Dataset
 Attribute Selection
 Approach
 Evaluation
 Problems
 Conclusions
 Future Direction

Introduction
Problem Statement
Marketing campaign strategy of XYZ International Bank
Increase in the number of marketing campaigns
 Economic pressure and Competition
 Product promotion
- Mass campaigns
- Directed marketing
 Reduction in cost and time
 Improvement in efficiency
- Less contacts , more successes

Objective
 Classify the potential customers
- Capable of subscribing to the Term Deposit projected to them
 Decision Tree Algorithm
- Rules based on some criteria or
characteristics of customer

Bank Marketing Dataset

The data is related with direct marketing campaigns of a Portuguese banking institution.
 # of Instances - 4521
# of Attributes - 16 + Output attribute
 Campaign Window : May – Nov (Attractive Term Deposits with good interest rates.)

Bank Marketing Dataset

 Class distribution (y) - No (88.48%) Yes (11.52%)
 Missing attribute values - None

Attribute Selection – Most IG
Expected information needed to classify a tuple in Training set - 0.515522 bits
(ID3 measure)
Rank

Attribute

Information Gain

1

duration

0.072523

2

poutcome

0.037581

3

job

0.009991

Evaluation – Confusion Matrix (Test data)

yes
TP
61
FP
50
P'
111

Actual
yes
no

Predicted
no
FN
106
TN
1140
N'
1246

Accuracy
(Recognition Rate)

=TP+TN/P+N

0.885041

Error Rate
(Misclassification rate)

=FP+FN/P+N

0.114959

Sensitivity(TPR)
Recall

=TP/P or (TP/TP+FN)

0.365269

Specificity(TNR)

=TN/N or (TN/FP+TN)

0.957983

Precision

=TP/TP+FP

0.549550

F Score

=2*Prec*Recall/
Prec+Recall

0.438849

P

167

N

1190
1357

Case of class
Case of class
imbalanced
imbalanced
data with only
data with only
11.52% as
11.52% as
“Yes”
“Yes”
What % of +ve
What % of +ve
tuples are labeled
tuples are labeled
as such
as such
What % of
What % of
tuples labeled
tuples labeled
as +ve are
as +ve are
actually as
actually as
such
such

Evaluation – ROC

 Area under the ROC Curve - 0.7992
 Larger the area , better is the model

Problems
 Missing values
 Pruning (noise/outliers)
 Unbalanced dataset
- Bias in prediction
- Over fitting / under fitting
(Too many/Too few variables in test set)

Conclusions
 The Bank should target the potential customers who have spent considerable
amount of time responding to the bank call with the duration ranging from 212
seconds to 638 seconds and also who have responded positively during the
previous campaign(2%) which comes at the cost of 75% hit rate.
 The Bank can also aim at the customers for whom the duration of call is more
than 802 seconds(4%) with 60% hit rate as there is likely chance that the
respective customer is genuinely interested in the deposit product.
 Other set of potential customers are with call duration ranging from 638
seconds to 802 seconds(1%) and who fall into the job category of housemaid,
services, technician etc as these set of people are averse to taking risks and look
for safe deposit of their savings with fixed returns(62% hit rate)
We would go ahead with further analysis which can lead to the profitability of
the client’s business.

Future Direction
 The overall accuracy of the classifier needs to be increased
• Use of Ensemble Methods for improving accuracy
- Bagging
- Boosting
- Random Forests
 Strategy for class imbalance problem(Ex: 1000 N 100 Y)
- Over sampling
- Under sampling etc
 Experimenting with other classification methods like Naïve
Bayesian, Rule based classification etc.

References
 Paper on “Bank Direct Marketing Using Rule Based
Classification”
 Paper on “A Comparison of Different Classification Techniques
for Bank Direct Marketing”
 Classification PPT - Dalhousie University
 Dataset - UCI repository
(http://archive.ics.uci.edu/ml/datasets/Bank+Marketing)

Bank market classification

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Bank market classification

Semelhante a Bank market classification (20)

Mais de Maruthi Nataraj K

Mais de Maruthi Nataraj K (15)

Último

Último (20)

Bank market classification