SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
REGRESSION ANALYSIS
Shailendra Tomar
Index
• What is Regression Analysis?
• Simple Regression Theory
• Example 1: House Price Model
• Run Simple Regression Using SAS
• Steps & Assumptions of Regression
• Multiple Regression Analysis
• Significance Testing
• Coefficient of Determination
• Example 2: Credit Card Model
• Model selection
• Verify Regression Assumptions
• Regression Diagnostics
• Run Multiple Regression Using SAS
2
WHAT IS REGRESSION ANALYSIS?
• When two or more things are related to each other and we want to quantify
the relationship between them, regression analysis is the right technique
• It goes beyond correlation by creating a mathematical equation to estimate or
predict the values within the range framed by the data
• The regression procedure demands at least one dependent and one or more
independent variables
• Dependent variable (also known as outcome or response variable) is built
upon independent variables (also called explanatory or predictor variable)
3
• Associative relationships between these variables
is analyzed by Regression Analysis
• It is commonly used in forecasting, time series
modelling, financial analysis, and market
research to find the causal effect relationship
between variables
Scatter Diagram
SIMPLE REGRESSION THEORY
• Let’s begin with simple linear regression which is easier to understand
• Remember ‘y=mx+c’ linear equation from high school which make the plot,
fitting a straight line to data
• In simple regression, this equation is modified to ‘y=β0 + β1x + ε’, where y is a
dependent variable and x is independent variable
• β0 same like y-intercept c is the estimated value of y when x is zero, while β1
similar to slope of line m is the estimated change in the average value of y as a
result of a unit change in x and ε is the error
• The error is needed because the regression model is based on sample rather
population (usually sample estimators are not close to the population mean)
• That is why Ordinary Least-Squares (OLS) procedure is used for selecting the
model parameters (β0 and β1) that minimize the sum of the squared
differences between y and ŷ and determine the best-fitting line
• The objective is always to minimize the error, which is difference between the
observed and the predicted values generated by the model ‘ŷ=b0 + b1x’ 4
EXAMPLE 1: HOUSE PRICE MODEL
• A real estate company
wants to examine the
relationship between the
selling price of a home (in
$1000s) and its size (in
square feet) for a specific
region.
• It selects a random sample
of 10 houses
• The scatterplot with the
data points shows the
positive linear relationship
• Higher the size of house
means higher the price of
the house
5
STATISTICS: HOUSE PRICE MODEL
6
Dependent
Variable (Y)
House Price
(in $1000s)
R-Square 0.5808Dependent Mean 286.5
Independent
Variable (x)
Size (in
square feet)
Adj R-Sq 0.5284Coeff Var 14.42594
Parameters 2Root MSE 41.33032Observations 10
Analysis of Variance (ANOVA)
Source DF Sum of Squares Mean Sqaure F Value Pr > F
Model 1 18935 18935 11.08 0.0104
Error 8 13666 1708.19565
Corrected Total 9 32601
Parameter Estimates
Variable Label DF Parameter Estimate Standard Error t Value Pr > |t|
Intercept Intercept 1 98.24833 58.03348 1.69 0.1289
X Size 1 0.10977 0.03297 3.33 0.0104
INTERPRETATION: HOUSE PRICE MODEL
7
• First, look at the ANOVA results in which Pr value is lesser than 0.05, meaning
the null hypothesis is rejected
• Second, R-Sqaure value is 0.58082 which means that 58.08% of the variation
in house prices is explained
• The regression model makes sense only when it fits the data better than the
baseline model, meaning the slope of the regression line is not equal to zero
• From the parameter estimates, House Price Model is ŷ= 98.24833 + 0.10977x
• Since the prices are in once thousand dollars, for each square feet, the
average value of house increases by 0.10977 ($1000) = $109.77
• For example, the expected price of a 2000 square feet house would be
98.24833 + 0.10977x2000 ($1000) = $219,638.20
• The estimation and prediction should happen only within the range of data
that was used for the regression analysis, else results are doubtful
• The remaining statistics will be discussed in Multiple Regression Analysis
RUN SIMPLE REGRESSION USING SAS
8
• Copy and paste above code in the program of SAS software
DATA House;
input Y X;
label Y = 'House Price in $';
label X = 'Size in Square Feet';
datalines;
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
;
ods graphics on;
title1 'Simple Regression
Analysis';
Title2 'House Price Model';
proc reg
PLOTS(ONLY)=FITPLOT;
model Y = X;
run;
ods graphics off;
title;
STEPS & ASSUMPTIONS OF REGRESSION
Step 1 Formulate the problem
Step 2 Define dependent & independent variables
Step 3 Build the general model
Step 4 Plot the scatter diagram
Step 5 Estimate the parameters
Step 6 Estimate the regression coefficient
Step 7 Test for significance
Step 8 Find the strength of the association
Step 9 Check the prediction accuracy
Step 10 Examine the residuals
Step 11 Cross-validate the model
9
• Linearity of the phenomenon
measured, meaning the mean
of dependent variable is linearly
related to independent variable
• Error are normally distributed
with a mean of zero
• Errors have equal variances, or
in other words the error term is
constant (Homoscedasticity)
• Error are independent, meaning
uncorrelated
10
• More powerful as it involves single dependent variable and two or more
independent variables
• The dependent variable should be in interval-scale and other variables in
metric or appropriately transformed
• Analyze the impact of a set of independent variables on the dependent
variable.
• The equation for multiple regression is ‘y=β0 + β1x1 + β2x2 +…+ βnxn + εn’,
where y is a dependent variable and x1,x2,xn are the independent variables
• The predicted values generated by the model ‘ŷ=b0 + b1x1 + b2x2 +…+ bnxn ’
where b0, b1, b2, and b0 are the estimators of β0, β1, β2 and βn
• The model parameters are estimated using Ordinary Least-Squares (OLS)
procedure which minimize the sum of the squared differences between y and
ŷ and determine the best-fitting line
• Before performing multiple regression, it is always recommended to check the
correlation among variables to avoid multicollinearity issue
MUTLIPLE REGRESSION ANALYSIS
• To provide justification for accepting or rejecting a given hypothesis
• In ANOVA, the null hypothesis is that all population means are equal and the
alternative hypothesis is that not all of the population means are equal. It is
assumed that the populations are normal and that they have equal variances.
11
SIGNIFICANCE TESTING
• To test the hypothesis, F ratio is calculated which has to be higher than the
Fisher distribution statistics (based on sample size), proving the model fit the
data better than the baseline model
• The results has p-value which should be lower 0.05 to confirm the probability
that relationship exists between dependent and independent variables
• Testing for the significance of the model parameters can be done in a manner
similar but using t test statistics
• In regression, there are three
types of sums of squares:
variation explained by model
(SSM), unexplained variation error
(SSE), and total variation (SST)
12
COEFFICIENT OF DETERMINATION
• Coefficient of determination (R2)
explains the strength of association
• R2 = SSM / SST
• It measures the percentage of the
variation in dependent variable
that is explained by the
independent variable
• The value of R2 closer to 1 means
regression line fits perfectly
whereas the value closer to 0
doesn’t fit the data well
• R2 value will keep increasing if we
add more independent variables to
the model and results can be
misleading
• After adding the first few variables,
additional independent variables
do not make much contribution
• Adjusted R2 tells the percentage of
variation explained by only the
independent variables that actually
affect the dependent variable
• For example, in below R2 values,
variables more than 3 does not add
any value to the model
EXAMPLE 2: CREDIT CARD MODEL
• A bank wants to predict the number of credit cards that a family uses (Y)
based on the following data – Family Number (ID), Family Size (X1), Family
income in thousand dollars (X2), and Number of automobiles owned (X3)
• A sample of 8 families is used in the analysis
• The objective is to find a better predicting value with minimum prediction
error squared
13
Family
ID
Actual No. of
Credit Cards (Y)
Baseline Prediction
(ȳ=ŷ)
Prediction Error
(y-ȳ)
Prediction Error squared
(y-ȳ)2
1 4 7 -3 9
2 6 7 -1 1
3 6 7 0 1
4 7 7 1 0
5 8 7 0 1
6 7 7 1 0
7 8 7 3 1
8 10 7 0 9
Total 56 (Y/N=56/8) 0 22
STATISTICS: CREDIT CARD MODEL
14
Dependent Variable
(Y)
No. of Credit
Cards
R-Square 0.8614
Dependent
Mean
7.0
Independent Variables
(X1 & X2)
Family Size &
Family Income
Adj R-Sq 0.8059Coeff Var 11.157
Parameters 3Root MSE 0.78099Observations 8
Analysis of Variance (ANOVA)
Source DF Sum of Squares Mean Sqaure F Value Pr > F
Model 2 18.95027 9.47514 15.53 0.0072
Error 5 3.04973 0.60995
Corrected Total 7 22
Parameter Estimates
Variable Label DF Parameter Estimate Standard Error t Value Pr > |t|
Intercept Intercept 1 0.48169 1.46141 0.33 0.7551
X1 Family Size 1 0.63224 0.25231 2.51 0.0541
X2 Family Income 1 0.21585 0.10801 2 0.1021
INTERPRETATION: CREDIT CARD MODEL
15
• ANOVA results shows that Pr value is lesser than 0.05, meaning the null
hypothesis is rejected and the relationship exists between Y1 and X1 & X2
• In this model, variation explained by model is 3.04953 which is lesser than
baseline model (where predicted error squared is 22)
• R-Sqaure value is 0.8614 which means that 86.14% of the variation in credit
cards is explained by this model
• When we included X3, the adjusted R-square decreased. Hence, we did not
include X3 in this model as it was statistically insignificant.
• From the parameter estimates, ŷ= 0.482 + 0.63*X1 + 0.216*X2
• Assuming the family size (X1) is 4 and its annual income (X2) is 17.5. the
predicted number of credit cars would be 6.782 (using above equation). Here
0.218 is the error if the value is round off and made it to 7 credit cards
• The estimation and prediction should happen only within the range of data
that was used for the regression analysis, else results are doubtful
MODEL SELECTION
16
• For effective modeling, one should always choose the best model, validate
regression assumptions, detect influential observations and check collinearity.
• Let’s understand model selection. Weather run regression manually or using
stepwise selection, the objective is to always have better model which can
explain more variation (R-square value closer to 1 is expected)
• Stepwise Regression is used often when there are many variables because this
method chooses the best possible combination of variables automatically,
based on their p-values.
• Below is the summary of statistics which shows how each variable entered in
the model influenced R-square and Adjusted R-square.
• When X3 entered into the model, the Adjusted R-square reduced, suggesting
to drop the variable from the model
Variables entered in model R-Square Adjusted R-Square F Value Pr > F
X1 0.7506 0.7091 18.06 0.0054
X2 0.8614 0.8059 15.53 0.0072
X3 0.8720 0.7761 9.09 0.0294
VERIFY REGRESSION ASSUMPTIONS
17
• To confirm the normality of the error term, check the histogram and
distribution curves
• Looking at Residual Plot, one can verify other two assumptions, equal variance
and independence, if errors are randomly plotted
• In the previous slide, the intercept was 0.482 (when intercept is not zero, the
linearity assumption is already verified
Influential observations
The R-square value can be affected by
outliers or influential observations. It
is necessary to look at Rstudent Plot.
Usually, values greater than two is
considered as outlier (3 for large
sample size). Cook’s D, DFFITS and
DFBETAS are other useful statistics.
Multicollinearity
It occurs when two or more
independent variables are highly
correlated with each other, which
leads to instability in the regression
model. To measure the magnitude of
collinearity in a model, VIF (Variance
Inflation Factor) is used and its
accepted values are up to 10.
18
REGRESSION DIAGNOSTICS
Variables VIF
X1 1.82692
X2 1.93492
X3 1.09976
In the credit card example, there is
absence of collinearity issue as VIF
values are lower than 8
RUN MULTIPLE REGRESSION USING SAS
19
• Copy and paste above code in the program of SAS software
DATA CreditCard;
INPUT ID Y X1 X2 X3;
LABEL ID = ‘Family Number’;
LABEL Y = ‘Number of Credit Cards‘;
LABEL X1 = ‘Family Size‘;
LABEL X2 = ‘Family income in $000‘;
LABEL X3 = ‘Number of cars owned‘;
DATALINES;
1 4 2 14 1
2 6 2 16 2
3 6 4 14 2
4 7 4 17 1
5 8 5 18 3
6 7 5 21 2
7 8 6 17 1
8 10 6 25 2
;
ODS GRAPHICS ON;
TITLE1 'Multiple Regression Analysis';
TITLE2 'Credit Card Model';
PROC REG
PLOTS(ONLY)=RESIDUALHISTOGRAM
PLOTS(ONLY)=RESIDUALBYPREDICTE
D
PLOTS(ONLY)=RSTUDENTBYPREDICTE
D
PLOTS(ONLY)=COOKSD
PLOTS(ONLY)=DFFITS
PLOTS(ONLY)=DFBETAS
PLOTS(ONLY)=DIAGNOSTICSPANEL;
MODEL Y = X1 X2;
RUN;
ODS GRAPHICS OFF;
TITLE;
Thank You

Mais conteúdo relacionado

Mais procurados

Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Wilcoxon signed rank test
Wilcoxon signed rank testWilcoxon signed rank test
Wilcoxon signed rank testBiswash Sapkota
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regressionalok tiwari
 
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)Pranjal Saxena
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantPRAJAKTASAWANT33
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisASAD ALI
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Two factor factorial_design_pdf
Two factor factorial_design_pdfTwo factor factorial_design_pdf
Two factor factorial_design_pdfRione Drevale
 
Minitab- A statistical tool
Minitab- A statistical tool Minitab- A statistical tool
Minitab- A statistical tool Dr-Jitendra Patel
 
Regression
Regression Regression
Regression Ali Raza
 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsTransweb Global Inc
 
Linear regression
Linear regressionLinear regression
Linear regressionTech_MX
 
The Kruskal-Wallis H Test
The Kruskal-Wallis H TestThe Kruskal-Wallis H Test
The Kruskal-Wallis H TestDr. Ankit Gaur
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 

Mais procurados (20)

Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Wilcoxon signed rank test
Wilcoxon signed rank testWilcoxon signed rank test
Wilcoxon signed rank test
 
Non parametric tests
Non parametric testsNon parametric tests
Non parametric tests
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)
Graphs(Biostatistics and Research Methodology) B.pharmacy(8th sem.)
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta Sawant
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Two factor factorial_design_pdf
Two factor factorial_design_pdfTwo factor factorial_design_pdf
Two factor factorial_design_pdf
 
Minitab- A statistical tool
Minitab- A statistical tool Minitab- A statistical tool
Minitab- A statistical tool
 
Regression
Regression Regression
Regression
 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | Statistics
 
Linear regression
Linear regressionLinear regression
Linear regression
 
The Kruskal-Wallis H Test
The Kruskal-Wallis H TestThe Kruskal-Wallis H Test
The Kruskal-Wallis H Test
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 

Semelhante a Simple & Multiple Regression Analysis

604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.pptRufesh
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regressionvinovk
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptxJayPatel711918
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxssuser378d7c
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 

Semelhante a Simple & Multiple Regression Analysis (20)

604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
BRM-lecture-11.ppt
BRM-lecture-11.pptBRM-lecture-11.ppt
BRM-lecture-11.ppt
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 
Regression
RegressionRegression
Regression
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptx
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 

Último

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Último (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Simple & Multiple Regression Analysis

  • 2. Index • What is Regression Analysis? • Simple Regression Theory • Example 1: House Price Model • Run Simple Regression Using SAS • Steps & Assumptions of Regression • Multiple Regression Analysis • Significance Testing • Coefficient of Determination • Example 2: Credit Card Model • Model selection • Verify Regression Assumptions • Regression Diagnostics • Run Multiple Regression Using SAS 2
  • 3. WHAT IS REGRESSION ANALYSIS? • When two or more things are related to each other and we want to quantify the relationship between them, regression analysis is the right technique • It goes beyond correlation by creating a mathematical equation to estimate or predict the values within the range framed by the data • The regression procedure demands at least one dependent and one or more independent variables • Dependent variable (also known as outcome or response variable) is built upon independent variables (also called explanatory or predictor variable) 3 • Associative relationships between these variables is analyzed by Regression Analysis • It is commonly used in forecasting, time series modelling, financial analysis, and market research to find the causal effect relationship between variables Scatter Diagram
  • 4. SIMPLE REGRESSION THEORY • Let’s begin with simple linear regression which is easier to understand • Remember ‘y=mx+c’ linear equation from high school which make the plot, fitting a straight line to data • In simple regression, this equation is modified to ‘y=β0 + β1x + ε’, where y is a dependent variable and x is independent variable • β0 same like y-intercept c is the estimated value of y when x is zero, while β1 similar to slope of line m is the estimated change in the average value of y as a result of a unit change in x and ε is the error • The error is needed because the regression model is based on sample rather population (usually sample estimators are not close to the population mean) • That is why Ordinary Least-Squares (OLS) procedure is used for selecting the model parameters (β0 and β1) that minimize the sum of the squared differences between y and ŷ and determine the best-fitting line • The objective is always to minimize the error, which is difference between the observed and the predicted values generated by the model ‘ŷ=b0 + b1x’ 4
  • 5. EXAMPLE 1: HOUSE PRICE MODEL • A real estate company wants to examine the relationship between the selling price of a home (in $1000s) and its size (in square feet) for a specific region. • It selects a random sample of 10 houses • The scatterplot with the data points shows the positive linear relationship • Higher the size of house means higher the price of the house 5
  • 6. STATISTICS: HOUSE PRICE MODEL 6 Dependent Variable (Y) House Price (in $1000s) R-Square 0.5808Dependent Mean 286.5 Independent Variable (x) Size (in square feet) Adj R-Sq 0.5284Coeff Var 14.42594 Parameters 2Root MSE 41.33032Observations 10 Analysis of Variance (ANOVA) Source DF Sum of Squares Mean Sqaure F Value Pr > F Model 1 18935 18935 11.08 0.0104 Error 8 13666 1708.19565 Corrected Total 9 32601 Parameter Estimates Variable Label DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Intercept 1 98.24833 58.03348 1.69 0.1289 X Size 1 0.10977 0.03297 3.33 0.0104
  • 7. INTERPRETATION: HOUSE PRICE MODEL 7 • First, look at the ANOVA results in which Pr value is lesser than 0.05, meaning the null hypothesis is rejected • Second, R-Sqaure value is 0.58082 which means that 58.08% of the variation in house prices is explained • The regression model makes sense only when it fits the data better than the baseline model, meaning the slope of the regression line is not equal to zero • From the parameter estimates, House Price Model is ŷ= 98.24833 + 0.10977x • Since the prices are in once thousand dollars, for each square feet, the average value of house increases by 0.10977 ($1000) = $109.77 • For example, the expected price of a 2000 square feet house would be 98.24833 + 0.10977x2000 ($1000) = $219,638.20 • The estimation and prediction should happen only within the range of data that was used for the regression analysis, else results are doubtful • The remaining statistics will be discussed in Multiple Regression Analysis
  • 8. RUN SIMPLE REGRESSION USING SAS 8 • Copy and paste above code in the program of SAS software DATA House; input Y X; label Y = 'House Price in $'; label X = 'Size in Square Feet'; datalines; 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 ; ods graphics on; title1 'Simple Regression Analysis'; Title2 'House Price Model'; proc reg PLOTS(ONLY)=FITPLOT; model Y = X; run; ods graphics off; title;
  • 9. STEPS & ASSUMPTIONS OF REGRESSION Step 1 Formulate the problem Step 2 Define dependent & independent variables Step 3 Build the general model Step 4 Plot the scatter diagram Step 5 Estimate the parameters Step 6 Estimate the regression coefficient Step 7 Test for significance Step 8 Find the strength of the association Step 9 Check the prediction accuracy Step 10 Examine the residuals Step 11 Cross-validate the model 9 • Linearity of the phenomenon measured, meaning the mean of dependent variable is linearly related to independent variable • Error are normally distributed with a mean of zero • Errors have equal variances, or in other words the error term is constant (Homoscedasticity) • Error are independent, meaning uncorrelated
  • 10. 10 • More powerful as it involves single dependent variable and two or more independent variables • The dependent variable should be in interval-scale and other variables in metric or appropriately transformed • Analyze the impact of a set of independent variables on the dependent variable. • The equation for multiple regression is ‘y=β0 + β1x1 + β2x2 +…+ βnxn + εn’, where y is a dependent variable and x1,x2,xn are the independent variables • The predicted values generated by the model ‘ŷ=b0 + b1x1 + b2x2 +…+ bnxn ’ where b0, b1, b2, and b0 are the estimators of β0, β1, β2 and βn • The model parameters are estimated using Ordinary Least-Squares (OLS) procedure which minimize the sum of the squared differences between y and ŷ and determine the best-fitting line • Before performing multiple regression, it is always recommended to check the correlation among variables to avoid multicollinearity issue MUTLIPLE REGRESSION ANALYSIS
  • 11. • To provide justification for accepting or rejecting a given hypothesis • In ANOVA, the null hypothesis is that all population means are equal and the alternative hypothesis is that not all of the population means are equal. It is assumed that the populations are normal and that they have equal variances. 11 SIGNIFICANCE TESTING • To test the hypothesis, F ratio is calculated which has to be higher than the Fisher distribution statistics (based on sample size), proving the model fit the data better than the baseline model • The results has p-value which should be lower 0.05 to confirm the probability that relationship exists between dependent and independent variables • Testing for the significance of the model parameters can be done in a manner similar but using t test statistics • In regression, there are three types of sums of squares: variation explained by model (SSM), unexplained variation error (SSE), and total variation (SST)
  • 12. 12 COEFFICIENT OF DETERMINATION • Coefficient of determination (R2) explains the strength of association • R2 = SSM / SST • It measures the percentage of the variation in dependent variable that is explained by the independent variable • The value of R2 closer to 1 means regression line fits perfectly whereas the value closer to 0 doesn’t fit the data well • R2 value will keep increasing if we add more independent variables to the model and results can be misleading • After adding the first few variables, additional independent variables do not make much contribution • Adjusted R2 tells the percentage of variation explained by only the independent variables that actually affect the dependent variable • For example, in below R2 values, variables more than 3 does not add any value to the model
  • 13. EXAMPLE 2: CREDIT CARD MODEL • A bank wants to predict the number of credit cards that a family uses (Y) based on the following data – Family Number (ID), Family Size (X1), Family income in thousand dollars (X2), and Number of automobiles owned (X3) • A sample of 8 families is used in the analysis • The objective is to find a better predicting value with minimum prediction error squared 13 Family ID Actual No. of Credit Cards (Y) Baseline Prediction (ȳ=ŷ) Prediction Error (y-ȳ) Prediction Error squared (y-ȳ)2 1 4 7 -3 9 2 6 7 -1 1 3 6 7 0 1 4 7 7 1 0 5 8 7 0 1 6 7 7 1 0 7 8 7 3 1 8 10 7 0 9 Total 56 (Y/N=56/8) 0 22
  • 14. STATISTICS: CREDIT CARD MODEL 14 Dependent Variable (Y) No. of Credit Cards R-Square 0.8614 Dependent Mean 7.0 Independent Variables (X1 & X2) Family Size & Family Income Adj R-Sq 0.8059Coeff Var 11.157 Parameters 3Root MSE 0.78099Observations 8 Analysis of Variance (ANOVA) Source DF Sum of Squares Mean Sqaure F Value Pr > F Model 2 18.95027 9.47514 15.53 0.0072 Error 5 3.04973 0.60995 Corrected Total 7 22 Parameter Estimates Variable Label DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Intercept 1 0.48169 1.46141 0.33 0.7551 X1 Family Size 1 0.63224 0.25231 2.51 0.0541 X2 Family Income 1 0.21585 0.10801 2 0.1021
  • 15. INTERPRETATION: CREDIT CARD MODEL 15 • ANOVA results shows that Pr value is lesser than 0.05, meaning the null hypothesis is rejected and the relationship exists between Y1 and X1 & X2 • In this model, variation explained by model is 3.04953 which is lesser than baseline model (where predicted error squared is 22) • R-Sqaure value is 0.8614 which means that 86.14% of the variation in credit cards is explained by this model • When we included X3, the adjusted R-square decreased. Hence, we did not include X3 in this model as it was statistically insignificant. • From the parameter estimates, ŷ= 0.482 + 0.63*X1 + 0.216*X2 • Assuming the family size (X1) is 4 and its annual income (X2) is 17.5. the predicted number of credit cars would be 6.782 (using above equation). Here 0.218 is the error if the value is round off and made it to 7 credit cards • The estimation and prediction should happen only within the range of data that was used for the regression analysis, else results are doubtful
  • 16. MODEL SELECTION 16 • For effective modeling, one should always choose the best model, validate regression assumptions, detect influential observations and check collinearity. • Let’s understand model selection. Weather run regression manually or using stepwise selection, the objective is to always have better model which can explain more variation (R-square value closer to 1 is expected) • Stepwise Regression is used often when there are many variables because this method chooses the best possible combination of variables automatically, based on their p-values. • Below is the summary of statistics which shows how each variable entered in the model influenced R-square and Adjusted R-square. • When X3 entered into the model, the Adjusted R-square reduced, suggesting to drop the variable from the model Variables entered in model R-Square Adjusted R-Square F Value Pr > F X1 0.7506 0.7091 18.06 0.0054 X2 0.8614 0.8059 15.53 0.0072 X3 0.8720 0.7761 9.09 0.0294
  • 17. VERIFY REGRESSION ASSUMPTIONS 17 • To confirm the normality of the error term, check the histogram and distribution curves • Looking at Residual Plot, one can verify other two assumptions, equal variance and independence, if errors are randomly plotted • In the previous slide, the intercept was 0.482 (when intercept is not zero, the linearity assumption is already verified
  • 18. Influential observations The R-square value can be affected by outliers or influential observations. It is necessary to look at Rstudent Plot. Usually, values greater than two is considered as outlier (3 for large sample size). Cook’s D, DFFITS and DFBETAS are other useful statistics. Multicollinearity It occurs when two or more independent variables are highly correlated with each other, which leads to instability in the regression model. To measure the magnitude of collinearity in a model, VIF (Variance Inflation Factor) is used and its accepted values are up to 10. 18 REGRESSION DIAGNOSTICS Variables VIF X1 1.82692 X2 1.93492 X3 1.09976 In the credit card example, there is absence of collinearity issue as VIF values are lower than 8
  • 19. RUN MULTIPLE REGRESSION USING SAS 19 • Copy and paste above code in the program of SAS software DATA CreditCard; INPUT ID Y X1 X2 X3; LABEL ID = ‘Family Number’; LABEL Y = ‘Number of Credit Cards‘; LABEL X1 = ‘Family Size‘; LABEL X2 = ‘Family income in $000‘; LABEL X3 = ‘Number of cars owned‘; DATALINES; 1 4 2 14 1 2 6 2 16 2 3 6 4 14 2 4 7 4 17 1 5 8 5 18 3 6 7 5 21 2 7 8 6 17 1 8 10 6 25 2 ; ODS GRAPHICS ON; TITLE1 'Multiple Regression Analysis'; TITLE2 'Credit Card Model'; PROC REG PLOTS(ONLY)=RESIDUALHISTOGRAM PLOTS(ONLY)=RESIDUALBYPREDICTE D PLOTS(ONLY)=RSTUDENTBYPREDICTE D PLOTS(ONLY)=COOKSD PLOTS(ONLY)=DFFITS PLOTS(ONLY)=DFBETAS PLOTS(ONLY)=DIAGNOSTICSPANEL; MODEL Y = X1 X2; RUN; ODS GRAPHICS OFF; TITLE;