SlideShare uma empresa Scribd logo
1 de 23
Default Payment Prediction System
Data Analysis and Predictive Analysis – R Programming and Azure ML
ASHISH ARORA
Introduction and Problem
• Banks plays a significant role in providing
financial services to help people and
business to achieve their goals as well as
reach their potential.
• To keep the integrity Bank must avoid in
investing wrong customers who can default
and cause loss to the Financial Institution.
Purpose and Process
• To build a predictive model that can be used to
help the Banks use their data efficiently to
make better decisions.
• A predictive analytics application allows the
banks and other financial institutions to
identify the risks and address them in real time
to reach better outcomes.
• Bank must able to analyze available data
related to the customers before making the
decision of issuing credit card.
• The model developed will use all possible
factors and data to predict whether the
customer would fail or succeed in making the
next payment with a rational accuracy. It would
benefit the bank before they make any
decisions against that customers. The target is
to minimize the risk of having loan loss.
Data Set
• https://archive.ics.uci.edu/ml/
datasets/default+of+credit+ca
rd+clients
• 30000 rows
• Features in dataset = 25
• This dataset contains
information on default
payments, demographic
factors, credit data, history of
payment, and bill statements
of credit card clients in Taiwan
from April 2005 to September
2005.
• There are no missing data.
R Code – Description
And Results
• # Read the .csv file in R
envorinment
• creditcarddata <-
read.csv("default of credit card
clients.csv")
• dim(creditcarddata)
Data Set Summary
• There are two key variable categories in the
dataset.
• Nominal variables include sex, education,
marriage, repayment statuses (PAY_X), etc.
• Numeric variables contains age, amount of
given credit (LIMIT_BAL), amount of bill
statements (BILL_AMT), and amount of
previous payments (PAY_AMT).
• The class variable (y) indicates whether that
customer had default payment the next
month or not. If yes, it is labeled 1,
otherwise, set to 0.
Structure of Data
Before Adding new
variables and
Tidying the Data
• This is the structure of Data
before reshaping and
cleaning step.
• New Variables can be created
to give more possibility of
predicting defaulters.
• SEX, EDUCARION and
MARRIAGE variable can be
converted from integer to
categorical data.
Structure of Data after
adding new variables
• 4 new columns are added to
make data set more
meaningful.
• The new columns being added
are work_status,
education_cat, MARRIAGE_cat
and SEX_cat
Reshaping the Data
• Reshaping the Data by converting
Quantitative Variables To New Factorial
Variables
• Factors are categorical variables that are
super useful in summary statistics, plots, and
regressions. They basically act like dummy
variables that R codes for you.
• Removing Variables which are not useful for
analysis.
• Variables removed from dataset are
PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6.
Structure And Summary of Data After
Tidying the Data
Exploring Data Via
Basic Visualization
• There are more female than male in the
dataset.
• There are clients who finished university-
level education.
• There are more single client than married,
but the number is quite closed.
• More Clients are employed.
Limit Balance Distribution
Determining
Balance Limit
Variability By
Factors of Gender,
Education and
Work State
• After creating box plots it is evident that gender has
no effects on determining balance limits by bank.
• Education level and Work Status are the most
important factors which are being considered by
banks to determine balance limits.
Relationship Between Marital Status &
Balance Limits Categorized By Gender
• By this graph, we can observe
that, there is no change for
females , balance limits
depending on their marital
status remains almost same
for both conditions either
married or single, however it
changes a lot on males side
maybe because of extra
expenditures which is the
reason on increased balance
limits.
Relationship between Limit
Balance & Default Payment
• Balance limits and count of
defaulted clients are almost
same for University and
Graduate Level. Additionally,
the ratio of defaulted clients at
high school level seems almost
the same as the university and
graduate levels.
Balance Limits By Age
Groups & Education
• This box plots shows that the
Balance Limit for higher Age
Group individuals are
increasing based on their
education status.
Correlations Between Limit Balance,
Bill Amounts & Payments
• This correlation plot shows us
that there is a low correlation
between the limit balances
and payments and bill
amounts. However it can be
seen that bill amounts has
high correlation between each
other as expected since the
bills are reflecting the
cumulative amounts.
Is there any
variability in
defaulting payment
next month based
on gender,
education and
martial status ?
• It seems that more males seems to default payment
and in case of education more clients with high
school as their last degree defaults payment.
• Martial Status of client doesn't show any variability.
Model Building
• This section is to start building
the model for predicting the
default payment outcome.
• Before building the model the
dataset was divided in training
and test data set.
• Train Data Set = 70%
• Test Data Set = 30%
Model Building Using Azure ML
• The Model is trained using Two-Class Decision Forest.
The classification matrix or
the confusion matrix
• This classifies our predictions as false positive, false negative, and so on.
• True Positive = The true positives are where the actual value is 1, so in other words, they defaulted and
the predicted value is also 1.
• False Positive = The false positive is where the predicted value is a 1, but the actual value is a 0. Okay, so
we predicted a positive, but we were wrong about it. That's why it's a false positive, so we predicted they
would default, they did not.
• False Negative = The false negative is where we predicted they would not default, and they defaulted.
• True Negative = True negative is where we predicted negative, we predicted they would not default, and
they did not default, okay.
• Accuracy = What Percent out of total test data set population is being predicted correctly.
• Accuracy = (TP+TN)/(TOTAL) = (662+6734)/(1329+275+662+6734) = 0.82
• Precision = how precise was your prediction?
• When you predicted default, how likely are you to be correct?
• Precision = TP / TP + FP = 662 / 662 + 275 = 0.707
• Recall = Out of the Total population, what fraction of population you correctly predicted who will
defaulted.
• Recall = 662 / 662 + 1329 = 0.332
Conclusion
• This project involves prediction of defaulters for
Credit Card Bank Customers.
• R programming is used for Exploratory Data
Analysis and Visualization.
• R and Azure ML is used for Model Building using
Logistic Regression and Two Class Decision
Forest Algorithim.

Mais conteúdo relacionado

Mais procurados

Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default predictionALTEN Calsoft Labs
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model finalRitu Sarkar
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDeep Borkar
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and PredictionSOUMIT KAR
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in BankingArul Bharathi
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language Aayush Kumar
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detectionijtsrd
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learningijtsrd
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 

Mais procurados (20)

Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
Loan prediction
Loan predictionLoan prediction
Loan prediction
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 

Semelhante a Default payment prediction system

Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentVishalPatil527
 
Measurement and Scaling.pptx
Measurement and Scaling.pptxMeasurement and Scaling.pptx
Measurement and Scaling.pptxNamrata Wagle
 
What is a Credit Score
What is a Credit ScoreWhat is a Credit Score
What is a Credit ScoreDarren De Jong
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
 
Employee Retension Capstone Project - Neeraj Bubby.pptx
Employee Retension Capstone Project - Neeraj Bubby.pptxEmployee Retension Capstone Project - Neeraj Bubby.pptx
Employee Retension Capstone Project - Neeraj Bubby.pptxBoston Institute of Analytics
 
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnoseWebinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnosePayScale, Inc.
 
25 Financial Health Metrics
25 Financial Health Metrics25 Financial Health Metrics
25 Financial Health MetricsBarbara O'Neill
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditFriedman Associates
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongData Con LA
 
Credit Repair Program: Partner Overview
Credit Repair Program: Partner Overview Credit Repair Program: Partner Overview
Credit Repair Program: Partner Overview sabrecredit
 
How to prepare for pay equity analyis
How to prepare for pay equity analyisHow to prepare for pay equity analyis
How to prepare for pay equity analyisPayScale, Inc.
 
How to Better Manage Your Veterinary Practice Finances
How to Better Manage Your Veterinary Practice FinancesHow to Better Manage Your Veterinary Practice Finances
How to Better Manage Your Veterinary Practice FinancesMcGaunnSchwadronCPA
 
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1PayScale, Inc.
 

Semelhante a Default payment prediction system (20)

Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptxCredit Scoring Capstone Project- Pallavi Mohanty.pptx
Credit Scoring Capstone Project- Pallavi Mohanty.pptx
 
Exploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk AssesmentExploratory Data Analysis For Credit Risk Assesment
Exploratory Data Analysis For Credit Risk Assesment
 
Measurement and Scaling.pptx
Measurement and Scaling.pptxMeasurement and Scaling.pptx
Measurement and Scaling.pptx
 
What is a Credit Score
What is a Credit ScoreWhat is a Credit Score
What is a Credit Score
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case Study
 
Employee Retension Capstone Project - Neeraj Bubby.pptx
Employee Retension Capstone Project - Neeraj Bubby.pptxEmployee Retension Capstone Project - Neeraj Bubby.pptx
Employee Retension Capstone Project - Neeraj Bubby.pptx
 
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 DiagnoseWebinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 2 Diagnose
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
25 Financial Health Metrics
25 Financial Health Metrics25 Financial Health Metrics
25 Financial Health Metrics
 
Estimating Supply and Demand for Microcredit
Estimating Supply and Demand for MicrocreditEstimating Supply and Demand for Microcredit
Estimating Supply and Demand for Microcredit
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're Wrong
 
Credit Repair Program: Partner Overview
Credit Repair Program: Partner Overview Credit Repair Program: Partner Overview
Credit Repair Program: Partner Overview
 
How to prepare for pay equity analyis
How to prepare for pay equity analyisHow to prepare for pay equity analyis
How to prepare for pay equity analyis
 
Indhu resume
Indhu resumeIndhu resume
Indhu resume
 
Indhu resume
Indhu resumeIndhu resume
Indhu resume
 
How to Better Manage Your Veterinary Practice Finances
How to Better Manage Your Veterinary Practice FinancesHow to Better Manage Your Veterinary Practice Finances
How to Better Manage Your Veterinary Practice Finances
 
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1
Webinar - How to Prepare for a Pay Equity Analysis Series Ep 1
 

Último

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Último (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Default payment prediction system

  • 1. Default Payment Prediction System Data Analysis and Predictive Analysis – R Programming and Azure ML ASHISH ARORA
  • 2. Introduction and Problem • Banks plays a significant role in providing financial services to help people and business to achieve their goals as well as reach their potential. • To keep the integrity Bank must avoid in investing wrong customers who can default and cause loss to the Financial Institution.
  • 3. Purpose and Process • To build a predictive model that can be used to help the Banks use their data efficiently to make better decisions. • A predictive analytics application allows the banks and other financial institutions to identify the risks and address them in real time to reach better outcomes. • Bank must able to analyze available data related to the customers before making the decision of issuing credit card. • The model developed will use all possible factors and data to predict whether the customer would fail or succeed in making the next payment with a rational accuracy. It would benefit the bank before they make any decisions against that customers. The target is to minimize the risk of having loan loss.
  • 4. Data Set • https://archive.ics.uci.edu/ml/ datasets/default+of+credit+ca rd+clients • 30000 rows • Features in dataset = 25 • This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005. • There are no missing data.
  • 5. R Code – Description And Results • # Read the .csv file in R envorinment • creditcarddata <- read.csv("default of credit card clients.csv") • dim(creditcarddata)
  • 6. Data Set Summary • There are two key variable categories in the dataset. • Nominal variables include sex, education, marriage, repayment statuses (PAY_X), etc. • Numeric variables contains age, amount of given credit (LIMIT_BAL), amount of bill statements (BILL_AMT), and amount of previous payments (PAY_AMT). • The class variable (y) indicates whether that customer had default payment the next month or not. If yes, it is labeled 1, otherwise, set to 0.
  • 7. Structure of Data Before Adding new variables and Tidying the Data • This is the structure of Data before reshaping and cleaning step. • New Variables can be created to give more possibility of predicting defaulters. • SEX, EDUCARION and MARRIAGE variable can be converted from integer to categorical data.
  • 8. Structure of Data after adding new variables • 4 new columns are added to make data set more meaningful. • The new columns being added are work_status, education_cat, MARRIAGE_cat and SEX_cat
  • 9. Reshaping the Data • Reshaping the Data by converting Quantitative Variables To New Factorial Variables • Factors are categorical variables that are super useful in summary statistics, plots, and regressions. They basically act like dummy variables that R codes for you. • Removing Variables which are not useful for analysis. • Variables removed from dataset are PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,PAY_6.
  • 10. Structure And Summary of Data After Tidying the Data
  • 11. Exploring Data Via Basic Visualization • There are more female than male in the dataset. • There are clients who finished university- level education. • There are more single client than married, but the number is quite closed. • More Clients are employed.
  • 13. Determining Balance Limit Variability By Factors of Gender, Education and Work State • After creating box plots it is evident that gender has no effects on determining balance limits by bank. • Education level and Work Status are the most important factors which are being considered by banks to determine balance limits.
  • 14. Relationship Between Marital Status & Balance Limits Categorized By Gender • By this graph, we can observe that, there is no change for females , balance limits depending on their marital status remains almost same for both conditions either married or single, however it changes a lot on males side maybe because of extra expenditures which is the reason on increased balance limits.
  • 15. Relationship between Limit Balance & Default Payment • Balance limits and count of defaulted clients are almost same for University and Graduate Level. Additionally, the ratio of defaulted clients at high school level seems almost the same as the university and graduate levels.
  • 16. Balance Limits By Age Groups & Education • This box plots shows that the Balance Limit for higher Age Group individuals are increasing based on their education status.
  • 17. Correlations Between Limit Balance, Bill Amounts & Payments • This correlation plot shows us that there is a low correlation between the limit balances and payments and bill amounts. However it can be seen that bill amounts has high correlation between each other as expected since the bills are reflecting the cumulative amounts.
  • 18. Is there any variability in defaulting payment next month based on gender, education and martial status ? • It seems that more males seems to default payment and in case of education more clients with high school as their last degree defaults payment. • Martial Status of client doesn't show any variability.
  • 19. Model Building • This section is to start building the model for predicting the default payment outcome. • Before building the model the dataset was divided in training and test data set. • Train Data Set = 70% • Test Data Set = 30%
  • 20.
  • 21. Model Building Using Azure ML • The Model is trained using Two-Class Decision Forest.
  • 22. The classification matrix or the confusion matrix • This classifies our predictions as false positive, false negative, and so on. • True Positive = The true positives are where the actual value is 1, so in other words, they defaulted and the predicted value is also 1. • False Positive = The false positive is where the predicted value is a 1, but the actual value is a 0. Okay, so we predicted a positive, but we were wrong about it. That's why it's a false positive, so we predicted they would default, they did not. • False Negative = The false negative is where we predicted they would not default, and they defaulted. • True Negative = True negative is where we predicted negative, we predicted they would not default, and they did not default, okay. • Accuracy = What Percent out of total test data set population is being predicted correctly. • Accuracy = (TP+TN)/(TOTAL) = (662+6734)/(1329+275+662+6734) = 0.82 • Precision = how precise was your prediction? • When you predicted default, how likely are you to be correct? • Precision = TP / TP + FP = 662 / 662 + 275 = 0.707 • Recall = Out of the Total population, what fraction of population you correctly predicted who will defaulted. • Recall = 662 / 662 + 1329 = 0.332
  • 23. Conclusion • This project involves prediction of defaulters for Credit Card Bank Customers. • R programming is used for Exploratory Data Analysis and Visualization. • R and Azure ML is used for Model Building using Logistic Regression and Two Class Decision Forest Algorithim.