3. Probability of Default Modeling 3
Probability of Default by Moody’s Grade
Importance of Calculating PD
Pricing loans
Investor return
Portfolio risk
4. Probability of Default Modeling 4
Deciding on Number of Factors for Scorecard
Helps Accuracy
Hurts Accuracy
4
Developing a Model
Marginal Contribution to Accuracy
4 8 12 16 20 n
Number of Factors in Scorecard
Pos
0
Neg
Recommended Range
For model building purposes,
we may want to have more
factors initially, with
understanding that some will
be discarded
6. Probability of Default Modeling 6
Overview of Data Preparation
Data preparation involves collection of the required data, and deciding sources and systems to
extract data. It also involves cleansing the data by removing financial statements that do not satisfy
the following criteria:
» Ratio checks: running the dataset through a series of data cleansing rules
» Default definition: consistent definition of default has to be determined to properly classify the obligors of the
underlying data into defaulters and non-defaulters
» Determine the default horizon: determining a time window to classify the financial statements into defaults and
non-defaults
Above criteria ensure that the data contains information of all obligors and the information is consistent with the
business segment for which the model is being built .
7. Probability of Default Modeling 7
Defining Default
Methodology for tagging financial statements as default
» If financial statements were less than 3 month before default event then these statements were removed from the
model development
» If 2 statements were available from 4 to 21 months before default event then statement closer to default event was
kept and tagged as default and other statement was dropped
» If a defaulted obligor had a statement that was more than 21 months before default event then the statement was
tagged as non-default
8. Probability of Default Modeling 8
Basic Checks
All statements were passed through a series of filtering criteria
» Total Assets <=0
» Total Liabilities < =0
» Total Revenue <=0
» Total assets do not match to the
sum of total liabilities and total
equity reserves (a threshold of
2% was used)
» Cash and Equivalents < 0
FINAL DATA SAMPLE (Before Basic Checks)
Total Statements: 868
Unique MFIs: 293
Defaults: 16 (1.84%)
1. Refer appendix 11 for details of basic check analysis
Basic Checks1
34 (3.9%) statements dropped
FINAL SAMPLE FOR MODEL DEVELOPMENT
Total Statements: 834
Unique MFIs: 292
Defaults: 16 (1.92%)
» Total Current Assets < 0
» Total Non Current Assets < 0
» Depreciation and Amortization < 0
» Total Operating Expenses < 0
» Total Long Term Liabilities < 0
10. The available data yields 46 potential factors for single
factor analysis
Different sources were considered to come up with a list of candidate factors for model development
» Microfinance Handbook by Joanna Ledgerwood
» Microfinance Consensus Guidelines Published by CGAP/The World Bank Group, September 2003
Probability of Default Modeling 10
Category Factor Name Calculation
Sustainability/Profitability
GrossMargin (Total_Revenue - Financial_Costs) / Total_Revenue
OperatingMargin (Total_Revenue - Financial_Costs - Loan_Loss_Provision - Operating_Expense)/Total_Revenue
ROE
(Total_Revenue - Financial_Costs - Loan_Loss_Provision - Operating_Expense)/(Total_Assets-
Total_Liabs)
ROA (Total_Revenue - Financial_Costs - Loan_Loss_Provision - Operating_Expense)/Total_Assets
Operational_self_sufficiency Total_Revenue/(Financial_Costs + Loan_Loss_Provision + Operating_Expense)
InterestCoverage Total_Revenue/Interest and fee expense on all funding liabilities (v3210 )
CashtoLiabs Cash & Cash Equivalents – Audited (v1110)/Total_Liabs
Asset/Liability
Management
Yield_on_Loan_Portfolio
(Total_Revenue - Financial_Costs - Loan_Loss_Provision -
Operating_Expense)/Gross_Loan_Portfolio
Gross_Yield_on_Loan_Portfolio
(Total_Revenue + Non_Operating_Income - Financial_Costs - Loan_Loss_Provision -
Operating_Expense - Non_Operating_Expense)/Gross_Loan_Portfolio
CurrentRatio Current_Assets/Current_Liabs
Funding_expense_ratio Interest and fee expense on all funding liabilities (v3210 )/Gross_Loan_Portfolio
LiabtoNetWorth Total_Liabs/(Total_Assets-Total_Liabs)
LiabtoAssets Total_Liabs/Total_Assets
LiabtoEBITDA
Total_Liabs/(Total_Revenue - Financial_Costs - Loan_Loss_Provision - Operating_Expense +
Depreciation and Amortization(v3530))
RevenuetoTotalAsts Total_Revenue/Total_Assets
Growth
Total_RevenueGrowth (Total_Revenue-Total_Revenue_Prev)/Total_Revenue_Prev
GrossPortfolioGrowth (Gross_Loan_Portfolio-Gross_Loan_Portfolio_Prev)/Gross_Loan_Portfolio_Prev
Size
LoanPortfolio_CPIAdj (229.601/CPI_INDEX)*Gross_Loan_Portfolio
Total_Assets_CPIAdj (229.601/CPI_INDEX)*Total_Assets
Avg_outstanding_loansize (229.601/CPI_INDEX)*Gross_Loan_Portfolio/nb outstanding loans (v8040)
11. The available data yields 46 potential factors for single
factor analysis (cont’d)
Probability of Default Modeling 11
Category Factor Name Calculation
Efficiency/Productivity
Loan_officer_productivity number of active borrowers (v8050)/ number of loan officers (v8010)
Personnel_productivity number of active borrowers (v8050)/ Number of employees (v8020)
Branch_Productivity number of active borrowers (v8050)/ Number of branches (v8030)
PBT_per_loan_officer
(229.601/CPI_INDEX)*(Total_Revenue + Non_Operating_Income - Financial_Costs -
Loan_Loss_Provision - Operating_Expense - Non_Operating_Expense)/number of loan officers
(v8010)
PBT_per_employee
(229.601/CPI_INDEX)*(Total_Revenue + Non_Operating_Income - Financial_Costs -
Loan_Loss_Provision - Operating_Expense - Non_Operating_Expense)/ Number of employees
(v8020)
PBT_per_branch
(229.601/CPI_INDEX)*(Total_Revenue + Non_Operating_Income - Financial_Costs -
Loan_Loss_Provision - Operating_Expense - Non_Operating_Expense)/Number of branches
(v8030)
loans_per_borrower Number of loans outstanding(v8040)/number of active borrowers (v8050)
Operating_expense_ratio Operating_Expense/Gross_Loan_Portfolio
Financial_Expense_ratio Financial_Costs/Gross_Loan_Portfolio
Cost_per_borrower (229.601/CPI_INDEX)*Operating_Expense/number of active borrowers (v8050)
Avg_portfolio_per_credit_officer (229.601/CPI_INDEX)*Gross_Loan_Portfolio/number of loan officers (v8010)
Portfolio quality
PAR_30_Ratio Portfolio at risk above 30 days (v7030)/Gross_Loan_Portfolio
PAR_180_Ratio Of which portfolio at risk above 180 days (v7100)/Gross_Loan_Portfolio
OnTime_Portfolio On-time portfolio (v7010)/Gross_Loan_Portfolio
Writeoff_Ratio Write offs (v7140)/Gross_Loan_Portfolio
Risk_coverage_ratio Loan loss reserve – Audited (v1220)/ Portfolio at risk above 30 days (v7030)
LoanLossReserve_Ratio Loan loss reserve – Audited (v1220)/Gross_Loan_Portfolio
Arrears_rate Portfolio in arrears (v7130)/Gross_Loan_Portfolio
Pct_Refinanced reprogrammed and refinanced loans (v7115)/Gross_Loan_Portfolio
Others
Avg_maturity_of_loans mean(v8174,v8184,v7914,v7924,v7934,v7944)
Pct_Urban_Clients_Volume
sum(Urban clients - volume of portfolio (v8410), Semi-Urban clients - volume of portfolio
(v8420),0)/Gross_Loan_Portfolio
Pct_Female_Clients_Volume Female clients - volume of portfolio (v8320)/Gross_Loan_Portfolio
Pct_Revenue_From_Investments Financial revenue from investments – Audited (v3120)/Total_Revenue
Pct_Group_Loans
sum(Self-help groups (v8250), Solidarity groups (v8260), Communal banks loans/Self-help groups –
volume (v8270))/Gross_Loan_Portfolio
Type_Of_Loans 6-nmiss(v8110,v8120,v8130,v8140,v8150,v8160)
Loans_to_Ind_Types 10-nmiss(v8510,v8520,v8530,v8540,v8542,v8544,v8546,v8548,v8549,v8550)
12. In general, factors are evaluated on the following set of
criteria
» Position Analysis: There must be enough observations. Observations where many values are
missing typically indicate that the information is difficult to obtain. This information should therefore
not be included in the final model
» Factors must be intuitive. Experienced credit analysts should be familiar with the factor and its
Probability of Default Modeling 12
relationship with credit risk given the credit culture in which they operate
» Factors must be consistent with expectations. Factor behaviour should be consistent with
business judgment and any deviations in expectations should be easily explained
» Factors must be powerful. The ultimate list of factors incorporated into the model should exhibit a
high degree of discriminatory power on the basis of credit risk
13. Single Factor Analysis Performance: 21 factors
recommended for further exploration in MFA
Probability of Default Modeling 13
Category Factor Name AR*
*AR = Accuracy Ratio
Default Rate
Relationship
Missing
%
Recommend
ation
Comments
Sustainability/
Profitability
GrossMargin 36% Good 2%
OperatingMargin -13% Counterintuitive 2%
ROE -5% Counterintuitive 3%
ROA -7% Counterintuitive 3%
Operational_self_sufficiency -11% Counterintuitive 2%
InterestCoverage 37% Good 2%
Asset/Liability
Management
Yield_on_Loan_Portfolio -5% Counterintuitive 2%
Gross_Yield_on_Loan_Portfolio -9% Counterintuitive 2%
CurrentRatio -28% Counterintuitive 2%
Funding_expense_ratio 39% Strong 1% High correlation with LiabtoAssets
Financial_Expense_ratio 46% Strong 1%
LiabtoNetWorth 12% Good 2% High correlation with LiabtoAssets
LiabtoAssets 13% Good 2%
LiabtoEBITDA -7% Counterintuitive 2%
CashtoLiabs 19% Good 0%
Growth
Total_RevenueGrowth 39% High missing %
GrossPortfolioGrowth 38% High missing %
Size
LoanPortfolio_CPIAdj -13% Counterintuitive 0%
Total_Assets_CPIAdj -14% Counterintuitive 2%
Avg_outstanding_loansize 4% Weak 5% Used as a proxy for Income level of the borrowers
14. Single Factor Analysis Performance : 21 factors
recommended for further exploration in MFA (cont’d)
Probability of Default Modeling 14
Category Factor Name AR*
Default Rate
Relationship
Missing
%
Recommend
ation
Comments
Efficiency/
Productivity
Loan_officer_productivity 23% Good 5%
Personnel_productivity 27% Good 5%
Branch_Productivity 18% Good 6%
PBT_per_loan_officer -8% Counterintuitive 6%
PBT_per_employee -17% Counterintuitive 6%
PBT_per_branch 3% Moderate 7%
RevenuetoTotalAsts 12% Moderate 2%
Operating_expense_ratio 28% Good 0%
Cost_per_borrower 19% Good 5%
Avg_portfolio_per_credit_officer 6% Good 4%
Portfolio
Quality
PAR_30_Ratio -8% Counterintuitive 4%
PAR_180_Ratio -32% Counterintuitive 8%
OnTime_Portfolio 1% Good 4%
Writeoff_Ratio 8% Moderate 7%
Risk_coverage_ratio 11% Moderate 6%
LoanLossReserve_Ratio -1% Moderate 2%
Arrears_rate -2% Weak 9%
Pct_Refinanced 14% High missing %
Others
Avg_maturity_of_loans 23% High missing %
loans_per_borrower 32% Strong 6% Used as a proxy for Debt to Income ratio of borrowers
Pct_Urban_Clients_Volume 23% Good 0%
Pct_Female_Clients_Volume 29% Good 5%
Pct_Revenue_From_Investments -1% Counterintuitive 1%
Pct_Group_Loans 20% High missing %
Type_Of_Loans 3% Moderate 0% Low diversity of responses and very low accuracy ratio
Loans_to_Ind_Types 10% Good 0% Used as a proxy for portfolio diversity
15. CAP Curve of PAR_30_Ratio
Probability of Default Modeling 15
PAR 30 Ratio
Key statistics: Relative Entropy 0.96, Accuracy Ratio -8%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
250
200
150
100
50
0
Frequencies and Default Rates for PAR_30_Ratio
missing 0.05 to High 0.025 to 0.05 0.01 to 0.025 0 to 0.01
Default Rate
» This factor performs inadequately with no discriminatory power
1
0.75
0.5
0.25
» Counterintuitive relationship between the responses and the default rate
Frequency
Answer
0
0 0.25 0.5 0.75 1
% Default
% Population
16. CAP Curve of PAR_180_Ratio
Probability of Default Modeling 16
PAR 180 Ratio
Key statistics: Relative Entropy 0.96, Accuracy Ratio -32%
7.5%
7.0%
6.5%
6.0%
5.5%
5.0%
4.5%
4.0%
3.5%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
250
200
150
100
50
0
Frequencies and Default Rates for PAR_180_Ratio
missing 0.012 to High 0.003 to 0.012 >0 to 0.003 0 to 0
Default Rate
1
0.75
0.5
0.25
» Counterintuitive relationship between the responses and the default rate
Frequency
Answer
0
0 0.25 0.5 0.75 1
% Default
% Population
17. Avg_outstanding_loansize
Key statistics: Relative Entropy 0.95, Accuracy Ratio 4% ?
CAP Curve of Avg_outstanding_loansize
0 0.25 0.5 0.75 1
% Population
Probability of Default Modeling 17
% Default
4.0%
3.5%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
250
200
150
100
50
0
Frequencies and Default Rates for Avg_outstanding_loansize
missing < 500 500 to 1500 1500 to 2500 2500 to 4000 4000 to High
Default Rate
» This factor performs inadequately with low discriminatory power
1
0.75
0.5
0.25
0
» Weak relationship between the responses and the default rate i.e. higher the score lower the default rate
Frequency
Answer
19. Starting with 21 Candidate Factors from SFA
Probability of Default Modeling 19
Section Factor Name AR
Default Rate
Relationship
Comments
Sustainability/
Profitability
GrossMargin 36% Good
InterestCoverage 37% Good
Asset/Liability
Management
Financial_Expense_ratio 46% Strong
LiabtoAssets 13% Good
CashtoLiabs 19% Good
Size Avg_outstanding_loansize 4% Weak Used as a proxy for Income level of the borrowers
Efficiency/
Productivity
Loan_officer_productivity 23% Good
Personnel_productivity 27% Good
Branch_Productivity 18% Good
PBT_per_branch 3% Moderate
RevenuetoTotalAsts 12% Moderate
Operating_expense_ratio 28% Good
Cost_per_borrower 19% Good
Avg_portfolio_per_credit_officer 6% Good
Portfolio Quality
OnTime_Portfolio 1% Good
Writeoff_Ratio 8% Moderate
Risk_coverage_ratio 11% Moderate
Others
loans_per_borrower 32% Strong Used as a proxy for Debt to Income ratio of borrowers
Pct_Urban_Clients_Volume 23% Good
Pct_Female_Clients_Volume 29% Good
Loans_to_Ind_Types 10% Good Used as a proxy for portfolio diversity
» As number of defaults are very low i.e. 16, we kept all the factors with positive accuracy ratio for MFA
» Return ratios e.g. ROA and ROE are not present in the candidate factors list because MFIs typically operate on
low return and higher base i.e. large assets
20. Pct_Female_Clients_Volume
Key statistics: Relative Entropy 0.88, Accuracy Ratio 29%
CAP Curve of Pct_Female_Clients_Volume
0 0.25 0.5 0.75 1
% Population
Probability of Default Modeling 20
1
0.75
0.5
0.25
% Default
Frequencies and Default Rates for Pct_Female_Clients_Volume
5.0%
4.5%
4.0%
3.5%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
600
500
400
300
200
100
0
missing 0 to 0.35 0.35 to High
Default Rate
» This factor performs adequately with moderate discriminatory power
0
» Good relationship between the responses and the default rate i.e. higher the score lower the default rate
Frequency
Answer
22. Model Number of Factors Significance Level1 AR2 Comments
Model 3 6 P Value <= 0.1 73.4% Best model after dropping Pct_Urban_Clients_Volume
Model 4 4 P Value <= 0.05 65.5% Best model after dropping Avg_outstanding_loansize
Model 5 5 P Value <= 0.1 69.4% Best model after dropping Avg_outstanding_loansize
» Due to low number of defaults we also considered models with 90% significance level of estimated coefficients
» Pct_Urban_Clients_Volume represents percentage of urban and semi-urban borrowers of an MFI’s portfolio. Though
this factors comes significant at 90% significance but we recommend not to include this factor in the model
because MFIs typically have semi-urban and rural borrowers. Model should not penalize an MFI for having large
base of rural clients
» Avg_outstanding_loansize was used as a proxy for income level of borrowers of MFIs. But given low accuracy ratio of
this factor we also considered models after dropping this factor which resulted in a drop of 6% in AR for model 4 and
11% for model 5 compared to model 1 and model 2 respectively
Probability of Default Modeling 22
Logistic Regression Models
Model 1 5 P Value <= 0.05 69.5%
Model 2 8 P Value <= 0.1 77.8%
1. For estimated coefficients and p value refer appendix 1
2. AR = Accuracy Ratio
23. Probability of Default Modeling 23
Beta Model – Factor Weights
Section Factor Name Factor AR Model 1 Model 2 Model 3 Model 4 Model 5
Sustainability/Profitability
GrossMargin 36%
InterestCoverage 37%
Asset/Liability Management
Financial_Expense_ratio 46% 22.4% 14.0% 18.2% 32.1% 26.7%
LiabtoAssets 13%
CashtoLiabs 19% 7.9% 11.6%
Size Avg_outstanding_loansize 4% 15.5% 13.7% 14.8%
Efficiency/
Productivity
Loan_officer_productivity 23%
Personnel_productivity 27%
Branch_Productivity 18% 8.6%
PBT_per_branch 3%
RevenuetoTotalAsts 12%
Operating_expense_ratio 28% 17.8% 14.0% 16.0% 25.1% 21.2%
Cost_per_borrower 19%
Avg_portfolio_per_credit_officer 6%
Portfolio Quality
OnTime_Portfolio 1%
Writeoff_Ratio 8%
Risk_coverage_ratio 11%
Others
loans_per_borrower 32% 17.6% 14.4% 15.2% 19.5% 18.2%
Pct_Urban_Clients_Volume 23% 7.8% 14.6%
Pct_Female_Clients_Volume 29% 26.7% 19.5% 24.2% 23.4% 19.3%
Loans_to_Ind_Types 10%
Number of Factors 5 8 6 4 5
Model AR 69.5% 77.8% 73.4% 65.5% 69.4%
» All models do not give any weight to sustainability/profitability and portfolio quality factors
25. Probability of Default Modeling 25
New Data Preparation
Quantitative (non SPA Data)
Total Statements: 731
Unique MFIs: 249
Defaults: 16
Qualitative (SPA Data)
Total Statements : 167
Unique MFIs: 167
Defaults: 10
Total Statements: 506
Unique MFIs: 161
Defaults: 10
(1.98%)
Quantitative model prepared
as before. Data for ‘Total
Revenue Growth’ and ‘Gross
Portfolio Growth’ updated for
missing values
Total Statements : 161
Unique MFIs: 161
Defaults: 10
Remove statements from the
quantitative data where MFI’s
are not common to SPA
(Qualitative) data
225 (30.8%) statements dropped
Combined Model has been
estimated on this data
6 MFI dropped due to no
exact match with quant data
Merging two datasets
1. Quantitative Models have been estimated on 731 records and 16 defaults
2. Qualitative Models for have been estimated on 161 records and 10 defaults
3. The combined model uses 506 records and 10 defaults
Qualitative Model was
prepared on this data
26. Candidate social factors were based on availability of reliable data. Data sourced from
the MIX and analyzed with Moody’s SPA
Low AR
Probability of Default Modeling 26
Candidate Social Factors
Variable ProbChiSq AR
Pricing Transparency Practices 0.463 6%
Disclosure of components of pricing 0.383 9%
Manner of communication of pricing 0.106 16%
Debt Collection Practices 0.059 27%
Specific debt collection policies 0.218 17%
Definition of acceptable and unacceptable
collection practices 0.218
17%
Voluntarily adopted consumer protection
standards 0.060
27%
Range of Products offered 0.159 24%
Policies included in Code of Ethics 0.351 15%
Written policies on hiring women 0.111 18%
Corruption Score 0.098 19%
Probability of
chance
occurrence is
high
27. Code of Ethics
Frequencies and Default Rates for Policies included in
15%
10%
5%
Code of Ethics
Probability of Default Modeling 27
Rejected Social Variables
27
20%
15%
10%
5%
0%
100
90
80
70
60
50
40
30
20
10
0
Pricing Transparency
Frequencies and Default Rates for Pricing
Transparency Practices
Less than equal to
0.5 0.5 to 0.9 Greater than 0.9
Default Rate
Frequency
Answer
0%
80
70
60
50
40
30
20
10
0
Less than
equal to 0.2 0.2 to 0.6 0.6 to 0.9
Greater than
0.9
Default Rate
Frequency
Answer
28. Range of Products Offered
Frequencies and Default Rates for Range of Products
10%
5%
offered
CAP Curve of Range of Products offered
Probability of Default Modeling 28
Accepted Social Variables
28
Debt Collection Practices
Frequencies and Default Rates for Debt Collection
20%
15%
10%
5%
0%
100
90
80
70
60
50
40
30
20
10
0
Practices
Less than
equal to 0.1 0.1 to 0.45 0.45 to 0.9
Greater than
0.9
Default Rate
Frequency
Answer
1
0.75
0.5
0.25
0
CAP Curve of Debt Collection Practices
0 0.25 0.5 0.75 1
% Default
% Population
0%
80
70
60
50
40
30
20
10
0
Less than
equal to 0.2 0.2 to 0.4 0.4 to 0.6 0.6 to 0.8
Greater
than 0.8
Default Rate
Frequency
Answer
1
0.75
0.5
0.25
0
0 0.25 0.5 0.75 1
% Default
% Population
29. 29
Probability of Default Modeling 29
Combined Model
Combining the Quantitative and Qualitative factors give an AR of 79.0%
Section Section Weight Factor Factor Weight Final Weight
Cash to Liabilities 13.77% 8.9%
Loans per borrower 16.48% 10.6%
Operating expense ratio 22.62% 14.6%
Financial Expense ratio 26.19% 16.9%
Percent Female Clients Volume 20.94% 13.5%
Debt Collection Practices 38.9% 13.9%
Range of Products offered 61.1% 21.8%
Quantitative Score 64%
Qualitative Score 35.6%
31. Qualitative factors are not necessarily judgmental, but
cannot be empirically confirmed by the data
Probability of Default Modeling 31
Franchise
Operating
Environment
Systems
» Market position and
sustainability
» Market size and
geographic
diversification
» Asset concentration
and earnings
diversification
» Macroeconomic
stability
» Regulatory strength
» Legal system and
corruption
» Audit process
» Board independence
and governance
» Financial reporting and
transparency
» Strength of credit
scoring and risk
management
» Access to alternative
funding sources