SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Bank Loan Case Study
Submitted By
Vaibhavi Khedekar
Bank loan Case study
Project description:
This case study attempts to demonstrate the application of EDA in a real-world
business environment. In this case study, in addition to using the techniques learned in
the EDA module, you will gain a basic grasp of risk analytics in banking and financial
services, as well as how data is utilized to reduce the risk of losing money when
lending to consumers.
Approach:
This case study has two enormous data sets: the current application and the previous
application. Each included several unneeded columns that would be useless for risk
assessments, as well as many blank data. I started by cleaning.
To evaluate this enormous set of data, I first cleaned the data, located some outliers
and deleted them, and then began performing univariate and bivariate analysis using
pivot tables and charts.
The following technology stack was used:
MySQL Workbench 8.0 CE, Microsoft Excel 2010
Results:
I went through the risk analytics process step by step, task after task. The project
outcomes are as follows:
1. Overall Method to Analysis: The bank's problem statement is to identify the major
causes of bank loan default. The knowledge will be used for risk assessment by the
company. We have provided two enormous data sets here.
1. 'application data.csv' contains all of the client's information at the time of
application. The information pertains to whether or not a client is having
financial issues.
2. 'previous application.csv' provides data from the client's previous loans. It
indicates if the prior application was Accepted, Cancelled, Refused, or Unused.
Both sets of data contained many undesired columns that will not be used for risk
analytics, as well as many blanks. So I cleaned up the data.
Following the data cleaning procedure, I split columns in the dataset based on two
categories of variables.
1) Categorical variables
2) Numerical variables
lOMoARcPSD|23272963
Categorical variables (non-numerical variables)- person's occupation, education
status.
Numerical variables - income, credit etc.,
The following are some of the categorical and numerical variables from the provided
data set.
Categorical variables Numeric variables
Gender Age
Name contract type Days employed
Income type Amount Income
Education Amount Annuity
Housing type Amount Credit
I completed full EDA on the present application and then on the previous application.
Then, in this report, I summarised the results of both applications and provided
business insights.
Current application.csv
Task 2 (Find Missing Data): The existing application sheet included 161 columns.
1) I deleted columns with more than 5% blank data.
2) I deleted a large number of useless columns.
To eliminate blank values, I used the COUNTBLANK function
Task 3: Outliers can only be identified on Numeric variables.
Box plotted Target column vs
1) Amount credit
2) Amount Income
3) Amount Annuity
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Task 4 (Data imbalance): Data imbalance occurs when data is disseminated in an
unequal manner. I plotted data imbalance using Pivot charts.
NAME CONTRACT TYPE
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Task 5 (EDA):
Univariate Analysis:
INFERENCE
Individuals with higher incomes are less likely to apply for loans. The credit amount
of a bank loan is typically in the range of 45000 to 1045000. The majority of loan
applications have come from people between the ages of 35 and 50. Those with 0 to 8
years of work experience are the most likely to seek for loans. Individuals who own
homes are more likely to apply for loans than others. Those who are married have
taken out more loans. More loans have been requested by working people.
Unaccompanied minors have requested for extra loans.
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Amount Income Amount Credit
AGE
Name suite type
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Bivariate Analysis:
INFERENCE
Customers who live in low-rating areas will have higher defaults. Individuals with
lower incomes are more likely to default. Young people are more likely to default,
and the trend of defaulters declines with age. Ladies are less inclined than males to
have defaults. More defaults are predicted due to maternity leave and
unemployment. Customers with more than five family members are more likely to
default on their bank loan. Customers with fewer educational qualifications are more
likely to fail on a bank loan. Customers with hardly work experience are more likely
to have defaults.
Region Rating Client vs Target Amount Income vs Target
Age vs Target Gender vs Target
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
INCOME TYPE VS TARGET FAMILY MEMBER VS TARGET
EDUCATION TYPE VS TARGET MONTHS EMPLOYED VS TARGET
Task 6 (Finding top 10 correlations):
Top 10 driving factors in current application.csv
1. Income type
2. Count of Family Members
3. Children count
4. External source
5. Region rating of client
6. Age
7. Months Employed
8. Amount credit
9. Amount Goods Price
10. Amount total income
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Earlier Application.csv
Task 2 (Data Cleaning): Column removal: I used the COUNTBLANK function to
determine the number of blanks in a column, and if it exceeded 5%, I eliminated it. I
removed a couple columns that were of no use to the analysis.
Task 3 (Finding Outliers):
Task 4(Data Imbalance):
Below are the columns where data is unevenly distributed
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Task 5 (EDA):
Univariate Analysis:
Inference
Customers have largely chosen cash and consumer loans. The majority of our clients
are repeat customers.
The majority of current loan applicants are individuals who applied for loans less
than ten months ago. More loans have been requested for consumer gadgets.
Bivariate Analysis:
Inference
Customers who applied for more than Rs. 350,000 will most likely be denied. The
majority of loans sought for through Credit and Cash agencies are cancelled.
New clients are overjoyed because the majority of their loans were approved. Thus
far, car loans have been denied. Loans made to MLM partner clients are likely to be
cancelled. Virtually 80% of the loans were authorised, with a steady stream of
rejections.
Consumer loans have nearly no cancellations and the greatest approval rate. Several
loans for the first Selling place area group were cancelled.
Clients who apply for another loan within 10 months of their previous loan are more
likely to have it cancelled. Walk-in loans have a higher refusal rate.
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Task 6 (Finding Correlations): Top ten reasons for loan cancellation and refusal
1. Amount Application
2. Cash loan Purpose
3. Goods Category
4. Product Combination
5. Product type
6. Channel type
7. Months Decision
8. Contract type
9. Client type
10. Payment type
Task 7 (Combining two sheets): I then ran analysis on the common set of data by
joining the Target column with the previous application table. I used MySQL to join
them. I loaded the data into workbench and ran the following query.
Query:
SELECT
TARGET,
SK_ID_CURR,
NAME_CONTRACT_TYPE,
AMT_APPLICATION,
NAME_CASH_LOAN_PURPOSE,
NAME_CONTRACT_STATUS,
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
NAME_CLIENT_TYPE,
DAYS_DECISION,
CODE_REJECT_REASON,
NAME_SELLER_INDUSTRY,
NAME_PORTFOLIO,
NAME_PRODUCT_TYPE,
CHANNEL_TYPE,
SELLERPLACE_AREA,
NAME_YIELD_GROUP,
PRODUCT_COMBINATION
FROM application_data
JOIN previous_application ON SK_ID_CURR;
pivot table analysis
Clients who have applied for previous loans have no defaults in current
loans
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963
Summary:
Loan- Highly Recommended
Groups
Loan- High Risk Groups
1. Previous application approved
clients
2. Married clients
3. Senior clients
4. More educated clients
5. Customers with a High Income
6.Clients with a greater external
source
7. Females
8.Customers with strong work
experience
1. Clients that are unemployed
2. Youth clients 3. Customers
whose prior applications were
denied
4. Low-income clientele
5. Clients with insufficient
external sources
6.Customers with little work
experience
7.Customers on Maternity Leave
8. Clients with a larger number
of family members
Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com)
lOMoARcPSD|23272963

Mais conteúdo relacionado

Mais procurados

Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation DeboraJasmin S
 
Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)TanviPradhan4
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentVishalPatil527
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction PresentationPinintiHarishReddy
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Outreach Digital
 
Project black book TYIT
Project black book TYITProject black book TYIT
Project black book TYITLokesh Singrol
 
SYNOPSIS ON BANK MANAGEMENT SYSTEM
SYNOPSIS ON BANK MANAGEMENT SYSTEMSYNOPSIS ON BANK MANAGEMENT SYSTEM
SYNOPSIS ON BANK MANAGEMENT SYSTEMNitish Xavier Tirkey
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
SQL Joins.pptx
SQL Joins.pptxSQL Joins.pptx
SQL Joins.pptxAnkit Rai
 
Resume and CV Summarization using NLP Report
Resume and CV Summarization using NLP ReportResume and CV Summarization using NLP Report
Resume and CV Summarization using NLP Reportsneha indulkar
 
Proguard: detecting malicious accounts in social-network-based online promotions
Proguard: detecting malicious accounts in social-network-based online promotionsProguard: detecting malicious accounts in social-network-based online promotions
Proguard: detecting malicious accounts in social-network-based online promotionsVaishali Misra
 

Mais procurados (20)

Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case Study
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Final PPT Imdb
Final PPT ImdbFinal PPT Imdb
Final PPT Imdb
 
Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)Credit EDA Assignment (Tanvi Pradhan)
Credit EDA Assignment (Tanvi Pradhan)
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk Assesment
 
Sih ppt
Sih pptSih ppt
Sih ppt
 
Credit EDA case study
Credit EDA case studyCredit EDA case study
Credit EDA case study
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 
Credit eda case study
Credit eda case studyCredit eda case study
Credit eda case study
 
Project black book TYIT
Project black book TYITProject black book TYIT
Project black book TYIT
 
SYNOPSIS ON BANK MANAGEMENT SYSTEM
SYNOPSIS ON BANK MANAGEMENT SYSTEMSYNOPSIS ON BANK MANAGEMENT SYSTEM
SYNOPSIS ON BANK MANAGEMENT SYSTEM
 
E healthcare
E healthcare E healthcare
E healthcare
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
SQL Joins.pptx
SQL Joins.pptxSQL Joins.pptx
SQL Joins.pptx
 
Resume and CV Summarization using NLP Report
Resume and CV Summarization using NLP ReportResume and CV Summarization using NLP Report
Resume and CV Summarization using NLP Report
 
Proguard: detecting malicious accounts in social-network-based online promotions
Proguard: detecting malicious accounts in social-network-based online promotionsProguard: detecting malicious accounts in social-network-based online promotions
Proguard: detecting malicious accounts in social-network-based online promotions
 
Dbms practical list
Dbms practical listDbms practical list
Dbms practical list
 
Ssis event handler
Ssis event handlerSsis event handler
Ssis event handler
 

Semelhante a project-6-bank-loan-case-study.pdf

Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsIRJET Journal
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswaAkhil Bandhu Hens, FRM
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
IFPRI's Poverty Scorecard
IFPRI's Poverty ScorecardIFPRI's Poverty Scorecard
IFPRI's Poverty ScorecardMTID
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iCUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iAkash Gupta
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Artificially Intelligent Investment Risk Calculation system based on Distribu...
Artificially Intelligent Investment Risk Calculation system based on Distribu...Artificially Intelligent Investment Risk Calculation system based on Distribu...
Artificially Intelligent Investment Risk Calculation system based on Distribu...iosrjce
 
IRJET- Consumer Complaint Data Analysis
IRJET-  	  Consumer Complaint Data AnalysisIRJET-  	  Consumer Complaint Data Analysis
IRJET- Consumer Complaint Data AnalysisIRJET Journal
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceeSAT Publishing House
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or deniedPOOJA PATIL
 
In Banking Loan Approval Prediction Using Machine Learning
In Banking Loan Approval Prediction Using Machine LearningIn Banking Loan Approval Prediction Using Machine Learning
In Banking Loan Approval Prediction Using Machine LearningIRJET Journal
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING mlaij
 
Apanps5210 - final presentation
Apanps5210 - final presentationApanps5210 - final presentation
Apanps5210 - final presentationManasa Damera
 

Semelhante a project-6-bank-loan-case-study.pdf (20)

Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
IFPRI's Poverty Scorecard
IFPRI's Poverty ScorecardIFPRI's Poverty Scorecard
IFPRI's Poverty Scorecard
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iCUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
D017612529
D017612529D017612529
D017612529
 
Artificially Intelligent Investment Risk Calculation system based on Distribu...
Artificially Intelligent Investment Risk Calculation system based on Distribu...Artificially Intelligent Investment Risk Calculation system based on Distribu...
Artificially Intelligent Investment Risk Calculation system based on Distribu...
 
IRJET- Consumer Complaint Data Analysis
IRJET-  	  Consumer Complaint Data AnalysisIRJET-  	  Consumer Complaint Data Analysis
IRJET- Consumer Complaint Data Analysis
 
Improving the credit scoring model of microfinance
Improving the credit scoring model of microfinanceImproving the credit scoring model of microfinance
Improving the credit scoring model of microfinance
 
Should this loan be approved or denied
Should this loan be approved or deniedShould this loan be approved or denied
Should this loan be approved or denied
 
In Banking Loan Approval Prediction Using Machine Learning
In Banking Loan Approval Prediction Using Machine LearningIn Banking Loan Approval Prediction Using Machine Learning
In Banking Loan Approval Prediction Using Machine Learning
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
 
Apanps5210 - final presentation
Apanps5210 - final presentationApanps5210 - final presentation
Apanps5210 - final presentation
 

Último

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Último (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 

project-6-bank-loan-case-study.pdf

  • 1. Bank Loan Case Study Submitted By Vaibhavi Khedekar
  • 2. Bank loan Case study Project description: This case study attempts to demonstrate the application of EDA in a real-world business environment. In this case study, in addition to using the techniques learned in the EDA module, you will gain a basic grasp of risk analytics in banking and financial services, as well as how data is utilized to reduce the risk of losing money when lending to consumers. Approach: This case study has two enormous data sets: the current application and the previous application. Each included several unneeded columns that would be useless for risk assessments, as well as many blank data. I started by cleaning. To evaluate this enormous set of data, I first cleaned the data, located some outliers and deleted them, and then began performing univariate and bivariate analysis using pivot tables and charts. The following technology stack was used: MySQL Workbench 8.0 CE, Microsoft Excel 2010 Results: I went through the risk analytics process step by step, task after task. The project outcomes are as follows: 1. Overall Method to Analysis: The bank's problem statement is to identify the major causes of bank loan default. The knowledge will be used for risk assessment by the company. We have provided two enormous data sets here. 1. 'application data.csv' contains all of the client's information at the time of application. The information pertains to whether or not a client is having financial issues. 2. 'previous application.csv' provides data from the client's previous loans. It indicates if the prior application was Accepted, Cancelled, Refused, or Unused. Both sets of data contained many undesired columns that will not be used for risk analytics, as well as many blanks. So I cleaned up the data. Following the data cleaning procedure, I split columns in the dataset based on two categories of variables. 1) Categorical variables 2) Numerical variables lOMoARcPSD|23272963
  • 3. Categorical variables (non-numerical variables)- person's occupation, education status. Numerical variables - income, credit etc., The following are some of the categorical and numerical variables from the provided data set. Categorical variables Numeric variables Gender Age Name contract type Days employed Income type Amount Income Education Amount Annuity Housing type Amount Credit I completed full EDA on the present application and then on the previous application. Then, in this report, I summarised the results of both applications and provided business insights. Current application.csv Task 2 (Find Missing Data): The existing application sheet included 161 columns. 1) I deleted columns with more than 5% blank data. 2) I deleted a large number of useless columns. To eliminate blank values, I used the COUNTBLANK function Task 3: Outliers can only be identified on Numeric variables. Box plotted Target column vs 1) Amount credit 2) Amount Income 3) Amount Annuity Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 4. Task 4 (Data imbalance): Data imbalance occurs when data is disseminated in an unequal manner. I plotted data imbalance using Pivot charts. NAME CONTRACT TYPE Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 5. Task 5 (EDA): Univariate Analysis: INFERENCE Individuals with higher incomes are less likely to apply for loans. The credit amount of a bank loan is typically in the range of 45000 to 1045000. The majority of loan applications have come from people between the ages of 35 and 50. Those with 0 to 8 years of work experience are the most likely to seek for loans. Individuals who own homes are more likely to apply for loans than others. Those who are married have taken out more loans. More loans have been requested by working people. Unaccompanied minors have requested for extra loans. Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 6. Amount Income Amount Credit AGE Name suite type Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 7. Bivariate Analysis: INFERENCE Customers who live in low-rating areas will have higher defaults. Individuals with lower incomes are more likely to default. Young people are more likely to default, and the trend of defaulters declines with age. Ladies are less inclined than males to have defaults. More defaults are predicted due to maternity leave and unemployment. Customers with more than five family members are more likely to default on their bank loan. Customers with fewer educational qualifications are more likely to fail on a bank loan. Customers with hardly work experience are more likely to have defaults. Region Rating Client vs Target Amount Income vs Target Age vs Target Gender vs Target Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 8. INCOME TYPE VS TARGET FAMILY MEMBER VS TARGET EDUCATION TYPE VS TARGET MONTHS EMPLOYED VS TARGET Task 6 (Finding top 10 correlations): Top 10 driving factors in current application.csv 1. Income type 2. Count of Family Members 3. Children count 4. External source 5. Region rating of client 6. Age 7. Months Employed 8. Amount credit 9. Amount Goods Price 10. Amount total income Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 9. Earlier Application.csv Task 2 (Data Cleaning): Column removal: I used the COUNTBLANK function to determine the number of blanks in a column, and if it exceeded 5%, I eliminated it. I removed a couple columns that were of no use to the analysis. Task 3 (Finding Outliers): Task 4(Data Imbalance): Below are the columns where data is unevenly distributed Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 10. Task 5 (EDA): Univariate Analysis: Inference Customers have largely chosen cash and consumer loans. The majority of our clients are repeat customers. The majority of current loan applicants are individuals who applied for loans less than ten months ago. More loans have been requested for consumer gadgets. Bivariate Analysis: Inference Customers who applied for more than Rs. 350,000 will most likely be denied. The majority of loans sought for through Credit and Cash agencies are cancelled. New clients are overjoyed because the majority of their loans were approved. Thus far, car loans have been denied. Loans made to MLM partner clients are likely to be cancelled. Virtually 80% of the loans were authorised, with a steady stream of rejections. Consumer loans have nearly no cancellations and the greatest approval rate. Several loans for the first Selling place area group were cancelled. Clients who apply for another loan within 10 months of their previous loan are more likely to have it cancelled. Walk-in loans have a higher refusal rate. Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 11. Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 12. Task 6 (Finding Correlations): Top ten reasons for loan cancellation and refusal 1. Amount Application 2. Cash loan Purpose 3. Goods Category 4. Product Combination 5. Product type 6. Channel type 7. Months Decision 8. Contract type 9. Client type 10. Payment type Task 7 (Combining two sheets): I then ran analysis on the common set of data by joining the Target column with the previous application table. I used MySQL to join them. I loaded the data into workbench and ran the following query. Query: SELECT TARGET, SK_ID_CURR, NAME_CONTRACT_TYPE, AMT_APPLICATION, NAME_CASH_LOAN_PURPOSE, NAME_CONTRACT_STATUS, Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 13. NAME_CLIENT_TYPE, DAYS_DECISION, CODE_REJECT_REASON, NAME_SELLER_INDUSTRY, NAME_PORTFOLIO, NAME_PRODUCT_TYPE, CHANNEL_TYPE, SELLERPLACE_AREA, NAME_YIELD_GROUP, PRODUCT_COMBINATION FROM application_data JOIN previous_application ON SK_ID_CURR; pivot table analysis Clients who have applied for previous loans have no defaults in current loans Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963
  • 14. Summary: Loan- Highly Recommended Groups Loan- High Risk Groups 1. Previous application approved clients 2. Married clients 3. Senior clients 4. More educated clients 5. Customers with a High Income 6.Clients with a greater external source 7. Females 8.Customers with strong work experience 1. Clients that are unemployed 2. Youth clients 3. Customers whose prior applications were denied 4. Low-income clientele 5. Clients with insufficient external sources 6.Customers with little work experience 7.Customers on Maternity Leave 8. Clients with a larger number of family members Downloaded by Vaibhavi Khedekar (vaibhi.khedekar@gmail.com) lOMoARcPSD|23272963