The capstone project is a Machine Learning application that creates a model for a famous bank in New Jersey.
It analyzes their Clients who took loans in their bank based on various parameters.
2. Introduction
Capstone project is a Machine Learning
application which creates a model for a famous
bank from New Jersey.
It analyze their Clients who took loan in their
bank based on various parameters.
3. Problem Statement
A Bank in New Jersey is looking to analyze different areas of their
bank clients. they also want to have inbuilt models which will
basically segregate the clients based on various parameters.
The model must be built to understand the Loan Status as well as to
identify different group of customers.
4. Applied Strategy
Building a Machine Learning classification
model to predict the loan status of clients
based on Duration, Amount and Payments
using Random Forest Classifier algorithm.
Clustering clients into different groups
based on Amount and Balance using ML
K-Means Clustering algorithm.
6. Proposed method and Architecture
Row Data
Data
Processing
Sampling
Training
Set
Validation
Set
Validation
Set
Build Model
Validate
New Data
7. Architecture
Raw Data: Data is collected in the form of excel files.
Data Processing: Preprocessing done to join all the input files based on
common key fields to create master table.
Sampling: Processed master table is splite into Train and Test datasets.
One is to train the model and another is to validate the model.
Algorithm: K-Means clustering and Random Forest Classifier is used to
create clusters among clients and to classify them based on based on
Amount and Balance and Duration, Amount and Payments
respectively.
8. Architecture
Build Model: K-Means is used for clustering and hyper tuned
parameters like n-clusters and obtained optimum value 2 clusters,
through Elbow method.
Random Forest Classifier is used classified the clients.
Validated with test dataset and obtained same accuracy as we got for
train dataset.
9. Methodology
Creating rough idea
to approach with
the solution.
Understanding the
relation among the
input files and
identifying keys to
join them.
Joining them using
proper type of join
which best fits the
problem statement.
Getting data
cleaned well.
Checking for
missing values and
removing
duplicates.
Obtaining useful
insights from pre-
processed data.
Creating graphs to
get obtain
important
information about
the data.
Scaling, Splitting
and Feature
extraction are
applied to make the
data ready to fit the
model.
Building Model.
10. Analysis and Implement
Loan Amount vs Duration From the figure it can be drawn
that clients with less loan
amount they paid within less
time.
That of clients with higher
amount took longer time to pay
back the debt.
11. Loan Amount Vs Balance
From the figure it is evident
that most of the clients falls in
the loan amount range of 50k-
1L and 1.5L-2.25L.
Most of them has balance in
the range of 10K-60K.
12. Loan Amount Vs Status
Status A people falls in
the range of 10K-1.2
from the fig
Most of the Status B and
C falls in the range of
1.7L-2.2L
Status D clients are very
less in count.
13. Status Wise Plot
It is clear from
figure that clients
with A has higher in
count and with D
has least value.
Status B and C has
average count of
clients.
14. Conclusion
We can see in the figure, all
the clients were classified
into four segments.
The classification is
performed by considering
the historical behavior of
the train dataset used
while training the Model.
15. Amount Vs Balance
clustering
We can see in the figure that
Clients were clustered (grouped)
into two categories 0, 1.
First cluster has the customers
with low loan_ amount and
balance ranging between low,
high.
Second cluster comprises the
clients with high loan_ amount
and balance between low , high.