SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
3/22/2010




                    IE 609


          Chapter 11
                                                                  The Relation between
 Simple Linear Regression and
                                                                  Two Sets of Measures
          Correlation




                                                  1                                            2




         The Relation between                                  The Relation between
         Two Sets of Measures                                  Two Sets of Measures
• Construct a scatter diagram for the following       • Plot Results
  data:




                                                  3                                            4




         The Relation between                                  The Relation between
         Two Sets of Measures                                  Two Sets of Measures
• You might have reversed the axes so that the        • Linear or Straight Line Relationship
  vertical dimension represented the midterm
  grade and the horizontal dimension, the final
  grade.
     d
• When one measure may be used to predict
  another, it is customary to represent the
  predictor on the horizontal dimension (the x-
  axis).

                                                  5                                            6




                                                                                                      1
3/22/2010




         The Relation between                                  The Relation between
         Two Sets of Measures                                  Two Sets of Measures
• Other relationships                                • Which of the diagrams represents the stronger
                                                       relationship?




                                                7                                                  8




         The Relation between
         Two Sets of Measures
• Which of the diagrams represents the stronger
  relationship?
                                                       Simple Linear Regression

                                                                               y = α + βx
                                                                             y = a + bxi + εi



                                                9                                                  10




    Simple Linear Regression                           Simple Linear Regression
             Minitab Data Entry
                                                          Calc ‐> Column Statistics
                                   Table 11.1
                                     Pg 393




                                                11                                                 12




                                                                                                          2
3/22/2010




 Simple Linear Regression                                                 Simple Linear Regression
     Calc ‐> Calculator (Create Formula, Store Variable: Residual




                                                                    13                                                        14




 Simple Linear Regression                                                 Simple Linear Regression
      Graph ‐> Probability Plot                                                     Residuals appear Normally Distributed




                                                                    15                                                        16




                Linear Regression                                        Linear Regression and Correlation
                Simple Structure                                                 Simple Structure
                                                                         Question…….. Is the sample mean of Demand the
                                                                           correct value to use for ŷ?
                             yi = ŷ + εi                                                        y i = ŷ + εi
ŷ → Sample mean = 34.0606 (Minitab*)                                     – Although it might seem to be a trivial question,
εi = yi - ŷ (Minitab “Residual”)                                           you might ask why the sample mean (y-bar) was
                                                                                 i h k h h                l         ( b )
                                                                           the correct value to use for ŷ ?
Sample Variance of ŷ = (10.7)2
                                                                         – Since the purpose of the is to accurately describe
                                                                           the yi then we would expect the model is to
* Mean of Demand , y (%) = 34.0606                                         deliver small errors (that is, εi) but how should we
                                                                           go about making small errors?

                                                                    17                                                        18




                                                                                                                                         3
3/22/2010




                       Linear Regression                                                                   Linear Regression
                       Simple Structure                                                                    Simple Structure
      Question…….. Is the sample mean of Demand the
        correct value to use for ŷ?                                                                            Σεi2  = Σ(ŷ – y)2
                             y i = ŷ + εi
                                                                                             The calculus operation that delivers this solution is
      – A logical choice is to pick ŷ, which might be
        different from the sample mean, so that the error
        variance s2 calculated with εi = yi - ŷ is
        minimized.              n

                                               i
                                                  2
                                                                                            This is called the method of least squares because the method 
                                s2       i 1                                              minimizes the error sum of squares.
                                          n 1
                                                                                      19                                                                   20




                       Linear Regression                                                                   Linear Regression
                       Simple Structure                                                                    Simple Structure
Now consider the scatter diagram below.  y appears to increase linearly with respect to x
                                                                                                                  y = α + βx
                                                                                            • The parameters α and β are the y axis intercept and
                                                                                              slope, respectively.
                                                                                            • Since we typically have sample data and not the
                                                                                              complete population of (x, y) observations, we cannot
                                                                                              expect to determine α and β, exactly- they will have
                                                                                              to be estimated from the sample data. Our model is of
                                                                                              the form

     There might be an underlying causal relationship between x and y of the form:
                                                                                                                 y = a + bxi + εi
                                       y = α + βx
                                                                                      21                                                                   22




                       Linear Regression                                                                   Linear Regression
                       Simple Structure                                                                    Simple Structure
                                                                                                         εi = (yi – ŷ) = yi – (a + bxi)
 • Then for any choice of a and b the εi may be                                             • Although this equation allows us to calculate the ε, for a
   determined from                                                                            given (x, yi) data set once a and b, are specified, there
                                                                                              are still an infinite number of a and b, values that could
                    εi = (yi – ŷ) = yi – (a + bxi )                                           be used in the model. Clearly the choice of a and b, that
                                                                                              provides the best fit to the data should make the εi or
                                                                                              some function of them small. Although many conditions
 • These errors or discrepancies εi , are also                                                can be stated to define best fit lines by minimizing the εi ,
   called the model residuals.                                                                by far the most frequently used condition to define the
                                                                                              best fit line is the one that minimizes Σεi2.
                                                                                      23                                                                   24




                                                                                                                                                                  4
3/22/2010




   Linear Regression and Correlation                                              Linear Regression
           Simple Structure                                                       Simple Structure
• That is, the best fit line for the (x, y) data, is called the   • The error variance for linear least squares regression is
  linear least squares regression line, which corresponds to        given by

  the choice of a and b, that minimizes Σεi2.
• The calculus sol tion to this problem is given by the
      calc l s solution                    gi en b
  simultaneous solution to the two equations:                       where n is the number of (xi, yi) observations and sε is
            n 2                   n 2                             called the standard error of the model.
             i  0
          a i 1
                                     i  0
                                  b i 1                         • The Equation has n- 2 in the denominator because two
                                                                    degrees of freedom are consumed by the calculation of the
• The method of fitting a line to (xi, yi) data using the           regression coefficients a and b from the experimental data.
  solution is called linear regression.
                                                            25                                                                  26




                 Linear Regression
                                                                  REGRESSION COEFFICIENTS
                 Simple Structure
 • Think of the error variance sε2 in the regression              • With the condition to determine the a and b,
   problem in the same way as you think of the sample
   variance s2 used to quantify the amount of variation in
                                                                    values that provide the best fit line for the
   simple measurement data.                                         (xi, yi) data, namely the minimization of Σεi2,
 • Whereas the sample variance characterizes the scatter
                                                                   we proceed to determine a and b in a more
   of observations about a single value y  y the error
   variance in the regression problem characterizes the             rigorous manner.
   distribution of values about the line ŷi = a + bxi
 • Sε2 and s2 are close cousins, they are both measures of
   the errors associated with different models for different
   kinds of data.

                                                            27                                                                  28




                                                                  REGRESSION COEFFICIENTS
                                                                  • The calculus method that determines the
                REGRESSION                                          unique values of a and b, that minimize Σεi2
                COEFFICIENTS                                        requires that we solve the simultaneous
                                                                    equations:
                                                                         i
         Determining the unique values of a and b                          n 2                       n 2
                                                                             i  0
                                                                          a i 1
                                                                                                        i  0
                                                                                                     b i 1

                                                            29                                                                  30




                                                                                                                                       5
3/22/2010




 REGRESSION COEFFICIENTS                                                                       REGRESSION COEFFICIENTS

• From these equations the resulting values of a                                               • SSX, and SSY are just the sums of squares
                                                                                                 required to determine the variances of the x
     and b, are best expressed in terms of sums of                                               and y values
      q
     squares:
                                     SS xy
                         b                      a  y  bx
                                     SS x
                                                                                                                                                   n
                                                                                                                                           SS y   ( yi  y ) 2
                                                                                                                 n
                                                                                                       SS x   ( xi  x ) 2
                          n                                       n
                 SS xy   ( xi  x ) ( yi  y )          SS y   ( yi  y ) 2
     SS xy                                                                                                      i 1
                                                                                                                                                  i 1
b                      i 1
                                 n
                                                                 i 1
     SS x
                    SS x   ( xi  x )      2

                                i 1

                                                                                      31                                                                           32




 REGRESSION COEFFICIENTS                                                                       REGRESSION COEFFICIENTS
• Similarly, using the sum of squares notation,                                                • Another important implication of Equations
  we can write the error sum of squares for the
  regression as                                                                                                         SS xy          a  y  bx
                                                                                                               b
                                                                                                                         SS x
• and the standard error as:                                                                   • that the point ( x, y ) fall on the best-fit line.
                                                                                                 This is just a consequence of the way the sums
                                                                                                 of squares are calculated


                                                                                      33                                                                           34




                                                                                                            LINEAR REGRESSION
 REGRESSION COEFFICIENTS
                                                                                                               ASSUMPTIONS
                                                                                               Stats > Regression > Fitted Line Plot




                                                                                s2=SSE/(n‐2)


                                                                        a  y  bx



                                                       ŷ=


                                                                                      35                                                                           36




                                                                                                                                                                          6
3/22/2010




        LINEAR REGRESSION                                                          LINEAR REGRESSION
           ASSUMPTIONS                                                                ASSUMPTIONS
                                                                        • A valid linear regression model requires that
                                                                          five conditions are satisfied:
                                                                           l. The values of x are determined without error.
                                                                           2. The εi, are normally distributed with mean με= 0 for all
                                                                              values of x.
                                                                           3 . The distribution of the εi, has constant variance σε2 for all
                                                                              values of x within the range of experimentation (that is,
                                                                              homoscedasticity)
                                                                           4. The εi are independent of each other.
                                                                           5. The linear model provides a good fit to the data

                                                                   37                                                                             38




                                                                            HYPOTHESIS TESTS FOR
                                                                           REGRESSION COEFFICIENTS
 HYPOTHESIS TESTS FOR                                                   • The values of the intercept and slope a and b
REGRESSION COEFFICIENTS                                                   found with Equations
                                                                                                SS xy           a  y  bx
                                                                                        b
                                                                                                SS x
                                                                          are actually estimates for the true parameters

                                          β
                                                                          α and β
                         α    α                  β
                              0                  0



    Hypothetical distributions for α and β
                                                                   39                                                                             40




 HYPOTHESIS TESTS FOR                                                       HYPOTHESIS TESTS FOR
REGRESSION COEFFICIENTS                                                    REGRESSION COEFFICIENTS
                                                                        • Although linear regression analysis will always return a and b
                                                                          values. it's possible that one or both of these values could be
                                                                          statistically insignificant. We require a formal method of
                                                                          testing α and β to see if they are different from zero.
                                                                          Hypotheses for these tests are:
                                                                          H      th     f th      t t
                                          β          β
                     α       α0
                                                     0
                                                                                                        H0: α0 = 0
                                                                                                        H1: α0 ≠ 0
            Hypothetical distributions for α and β
                                                                                                        H0: β 0 = 0
Both of these distributions follow Student's t distribution with                                        H1: β 0 ≠ 0
degrees of freedom equal to the error degrees of freedom.
                                                                          To perform these tests we need some idea of the amount of variability
                                                                          present in the estimates of α and β
                                                                   41                                                                             42




                                                                                                                                                         7
3/22/2010




   HYPOTHESIS TESTS FOR                                                      HYPOTHESIS TESTS FOR
  REGRESSION COEFFICIENTS                                                   REGRESSION COEFFICIENTS
                                                                         • The hypothesis tests can be performed using
• Estimates of the variances σα0 and σβ0 are given                         one-sample t tests with dfε = n -2 degrees of
  by:                                                                      freedom with the t statistics.
                    sα2 =                                                                                         
                                                                                                      t 
                                                                                                                  s
                                                                                                      and
                    sβ2 =                                                                                          
                                                                                                      t 
                                                                                                                  s
                                                                                                                                           Microsoft Equation
                                                                    43                                                                             3.0          44




   HYPOTHESIS TESTS FOR                                                      HYPOTHESIS TESTS FOR
  REGRESSION COEFFICIENTS                                                   REGRESSION COEFFICIENTS
• The (1 -α) 100% confidence intervals for α and                         • It is very important to realize that the variances of a and b as given
                                                                           are proportional to the standard error of the fit Sε. This means that if
  β are determined from                                                    there are any uncontrolled variables in the experiment that cause the
                                                                           standard error to increase. there will be a corresponding increase in
                                                                           the standard deviations of the regression coefficients. This could
      P(a - tα/2sa < α < a + tα/2sa ) = 1- α
              /2               /2                                          make the regression coefficients disappear into the noise.
                                                                               k th          i       ffi i t di          i t th     i
                                                                         • Always keep in mind that the model's ability to predict the
                                                                           regression coefficients is dependent on the size of the standard error.
      P(b - tα/2sb < β < b+ tα/2sb ) = 1- α                                Take care to remove or control or account for extraneous variation
                                                                           so that you get the best predictions from your models with the least
                                                                           effort.
      with n -2 degrees of freedom.

                                               Microsoft Equation
                                                       3.0          45                                                                                          46




 CONFIDENCE LIMITS FOR THE                                                 CONFIDENCE LIMITS FOR THE
     REGRESSION LINE                                                           REGRESSION LINE
                                                                          Stat > Regression > Fitted Line Plot menu. 
• The true slope and intercept of a regression line are                  You will have to select Display Confidence Bands in the Options menu to add the 
  not exactly known.                                                     confidence limits to the fitted line plot. 


• The (l – α) 100% confidence interval for the
  regression line is given by:
         i li i i          b




                                                                    47                                                                                          48




                                                                                                                                                                       8
3/22/2010




   PREDICTION LIMITS FOR THE                                        PREDICTION LIMITS FOR THE
      OBSERVED VALUES                                                  OBSERVED VALUES
                                                                     Stat>Regression> Fitted Line Plot menu. 
• The prediction interval provides prediction bounds                 You will have to select Display Prediction Bands in the Options menu
  for individual observations. The width of the
  prediction interval combines the uncertainty of the
  position of the true line as described by the
  confidence interval with the scatter of points about
  the line as measured by the standard error.




      where tα/2 has dfε= n - 2 degrees of freedom

                                                         49                                                                                 50




                                                                             CORRELATION
                                                                      Coefficient of Determination r2
                                                              • A comprehensive statistic is required to measure the fraction
                                                                of the total variation in the response y that is explained by
                                                                the regression model.
              CORRELATION                                     • The total variation in y taken relative to y-bar is given by SSy
                                                                = Σ(yi – y ) but SSy is partitioned into two terms: one that
    COEFFICIENT OF DETERMINATION (r2)                           accounts for the amount of variation explained by the straight
                                                                line model given by SSregression and another that accounts for
      CORRELATION COEFFICIENT (r)                               the unexplained error variation given by
                                                              .
                                                              .                                     n
                                                                                       SS   ( yi  y ) 2   i
                                                                                                                                  2

                                                                                                   i 1

                                                         51                                                                                 52




            CORRELATION
     Coefficient of Determination r2                              CORRELATION COEFFICIENT (r)
• The three quantities are related by:                            • The correlation coefficient r is given by the square root
                                                                    of the coefficient of determination r2 with an
                                                                    appropriate plus or minus sign.
• Consequently the fraction of SSy explained by the
  Consequently,                                                   • If two measures have a linear relationship, it is possible
  model is:                                                         to describe how strong the relationship is by means of a
                                                                    statistic called a correlation coefficient r.
                                                                  • The symbol for the correlation coefficient is r.
                                                                  • The symbol for the corresponding population
                                                                    parameter is ρ (the Greek letter "rho").
where r2 is called the coefficient of determination.
                                                         53                                                                                 54




                                                                                                                                                   9
3/22/2010




CORRELATION COEFFICIENT (r)                                      CORRELATION COEFFICIENT (r)

 • Pearson product-moment correlation                             • The basic formulas for the correlation
                                                                    coefficient are




                                                            55                                                                   56




PEARSONS PRODUCT-MOMENT                                                           CORRELATION
CORRELATION COEFFICIENT (r)                                        The Coefficient of Determination r2
 • Given a set of data. (Example 11.10, pg 435 )                 • The coefficient of determination finds numerous
                                                                   applications in regression and multiple regression problems.
   Find r         x       y          x        y
                 0.414   29186      0.548   67095                • Since SSregression is bounded by 0≤ SSregression ≤SSy there are
                 0.383
                 0.399
                         29266
                         26215
                                    0.581
                                    0.557
                                            85156
                                            69571
                                                                   corresponding bounds on the coefficient of determination
                 0.402   30162      0.55    84160                  given by 0 ≤ r2 ≤ 1.1
                 0.442   38867      0.531   73466
                 0.422   37831      0.55    78610                • When r2 = 0 the regression model has little value because
                 0.466
                  0.5
                         44576
                         46097
                                    0.556
                                    0.523
                                            67657
                                            74017
                                                                   very little of the variation in y is attributable to its
                 0.514   59698      0.602   87291                  dependence on r. When r2 = 1 the regression model almost
                 0.53    67705      0.569   86836
                 0.569   66088      0.544   82540                  completely explains all of the variation in the response, that
                 0.558
                 0.577
                         78486
                         89869
                                    0.557
                                    0.53
                                            81699
                                            82096
                                                                   is, r almost perfectly predicts y.
                 0.572
                 0.548
                         77369
                         67095
                                    0.547
                                    0.585
                                            75657
                                            80490
                                                                 • We're usually hoping for r2 = l, but this rarely happens.
                                                            57                                                                   58




     Confidence Interval for the                                       Confidence Interval for the
     Coefficient of Determination r2                                   Correlation Coefficient (r)
                                                                 • When the distribution of the regression model residuals
• The coefficient of determination r2 is a statistic that          is normal with constant variance, the distribution of r is
  represents the proportion of the total variation in the          complicated, but the distribution of:
  values of the variable Y that can be accounted for or
  explained by a linear relationship with the random
      l i d b li          l i hi i h h            d                is appro imatel normal with mean:
                                                                      approximately        ith
  variable X.
                                                                                 and standard deviation:
• A different data set of (x, y) values will give a
  different value of r2. The quantity that such r2 values
                                                                 • The transformation of r into Z is called Fisher's Z
  estimate is the true population coefficient of                   transformation.
  determination p2, which is a parameter.
                                                            59                                                                   60




                                                                                                                                       10
3/22/2010




      Confidence Interval for the                                                      LINEAR REGRESSION
      Correlation Coefficient (r)                                                         WITH MINITAB
• This information can be used to construct a                                • MINITAB provides two basic functions for
  confidence interval for the unknown parameter                                performing linear regression
  µz from the statistic r and the sample size n.                             1. Stat Regression> Fitted Line Plot menu is
  The confidence interval is:                                                   the best place to start to evaluate the q
                                                                                          p                             quality
                                                                                                                              y
                                                                                of the fitted function.
                                                                                 Includes a scatter plot of the (x, y,) data with the
                                                                                 superimposed fitted line, a full ANOVA table
                                                                                 and an abbreviated table of regression
                                                                                 coefficients.

                                                                       61                                                                     62




          LINEAR REGRESSION                                                            LINEAR REGRESSION
             WITH MINITAB                                                                 WITH MINITAB
                                                                                Stat>Regression> Regression menu  
2.   Stat>Regression> Regression menu
     The first part is a table of the regression coefficients and the
     corresponding standard deviations, t values, and p values. The
     second part is the ANOVA table, which summarizes the statistics
     required to determine the regression coefficients and the summary
     statistics like r, r2, radj. and sε.
      t ti ti lik                   d
     There is a p-value reported for the slope of the regression line in
     the table of regression coefficients and another p value reported in
     the ANOVA table for the ANOVA F test. These two p values are
     numerically identical and not just by coincidence. There is a
     special relationship that exists between the t and F distributions
     when the F distribution has one numerator degree of freedom.


                                                                       63                                                                     64




                                                                                   POLYNOMIAL MODELS
                                                                            • The general form of a polynomial model is:
                                                                                               ŷ = a + b1 x + b2x2 + …+bpxp
       POLYNOMIAL MODELS
                                                                              where the polynomial is said to be of order p. The
                                                                                          p y                                 p
                                                                              regression coefficients a, b1, . . . ,bp are determined using
                ŷ = a + b1 x + b2x2 + …+bpxp                                  the same algorithm that was used for the simple linear
                                                                              model; the error sum of squares is simultaneously
                                                                              minimized with respect to the regression coefficients. The
                                                                              family of equations that must be solved to determine the
                                                                              regression coefficients is nightmarish, but most of the good
                                                                              statistical software packages have this capability.
                                                                       65                                                                     66




                                                                                                                                                    11
3/22/2010




      POLYNOMIAL MODELS                                             POLYNOMIAL MODELS
• Although high-order polynomial models can fit              • Because of their complexity, it's important to
  the (x, y) data very well, they should be of the             summarize the performance of polynomial
  lowest order possible that accurately represents
  the relationship between y and x. There are no               models using r2adjusted instead of r2. In some
  clear guidelines on what order might be
   l       id li          h t d      i ht b                    cases when there are relatively few error
  necessary, but watch the significance (that is, the          degrees of freedom after fitting a large
  p values) of the various regression coefficients to          polynomial model, the r2 value could be
  confirm that all of the terms are contributing to            misleadingly large whereas r2adjusted will be
  the model. Polynomial models must also be
                                                               much lower but more representative of the true
  hierarchical, that is, a model of order p must
  contain all possible lower-order terms.                      performance of the model.
                                                        67                                                  68




      POLYNOMIAL MODELS                                             POLYNOMIAL MODELS
• Fit the following data with an appropriate                 • Solution:
  model and use scatter plots and residuals                  scatter plots and 
  diagnostic plots to check for lack of fit.                 residuals diagnostic plots




                                                        69                                                  70




      POLYNOMIAL MODELS                                             POLYNOMIAL MODELS
• Solution: SCATTER PLOT                                     • Solution: SCATTER PLOT




                                                        71                                                  72




                                                                                                                  12
3/22/2010




     POLYNOMIAL MODELS                                           POLYNOMIAL MODELS
• Solution: Residuals diagnostic plots                      • Solution: Residuals diagnostic plots




                                                       73                                            74




     POLYNOMIAL MODELS                                           POLYNOMIAL MODELS
• Solution: Residuals diagnostic plots                      • Solution: Residuals diagnostic plots




                                                       75                                            76




     POLYNOMIAL MODELS                                           POLYNOMIAL MODELS
• Solution: Quadratic                    Create x^2 
                                         Column
                                                            • Solution:
                                                            Quadratic
                                                            Model




                                                       77                                            78




                                                                                                           13
3/22/2010




      POLYNOMIAL MODELS
• Solution: Quadratic Model
  Stat > Regression > Fitted Line Plot (x,y) – Quadratic


                                                                          Multiple Regression




                                                           79                                                                  80




         Multiple Regression                                              Multiple Regression
• When a response has n quantitative predictors                 • This equation has the same basic structure as
  such as y (x1 x2, .. . , xn), the model for y must              the polynomial model and, in fact, the two
  be created by multiple regression. In multiple                  models are fitted and analyzed in much the
  regression each predictive term in the model                    same way. Where the work-sheet to fit the
                                                                  polynomial model requires n columns, one for
  has its own regression coefficient. The                         each power of x, the worksheet to fit the
  simplest multiple regression model contains a                   multiple regression model requires n columns
  linear term for each predictor:                                 to account for each of the n predictors. The
                                                                  same regression methods are used to analyze
                                                                  both problems.
                                                           81                                                                  82




         Multiple Regression                                              Multiple Regression
• Frequently, the simple linear model in does not               • PROBLEM                        • Selling Price Table
                                                                                                   (in thousands of dollars)
  fit the data and a more complex model is                        A real‐estate executive 
                                                                  would like to be able to 
  required. The terms that must be added to the                   predict the cost of a house 
  model to achieve a good fit might involve                       in a housing development 
                                                                  in a housing development
  interactions, quadratic terms, or terms of even                 on the basis of the number 
                                                                  of bedrooms and bath‐
  higher order. Such models have the basic form:                  rooms in the house. 




                                                           83                                                                  84




                                                                                                                                     14
3/22/2010




            Multiple Regression                                                      Multiple Regression
• The following first-order model is assumed to connect the                • MINITAB SOLUTION
  selling price of the home with the number of bedrooms and the            • Stat > Regression > Regression.
  number of baths. The dependent variable is represented by y
  and the independent variables are x1,the number of bedrooms,
  and x2, the number of baths.




                                                                      85                                       86




            Multiple Regression                                                      Multiple Regression
• MINITAB Output
    – Stat > Regression > Regression.




                                                                      87                                       88




            Multiple Regression                                                      Multiple Regression
• Problem
                                                                            Blood pressure 
  The following table contains data from a blood pressure study. The
                                                                             study on fifty 
  data were collected on a group of middle aged men. Systolic is the
  systolic blood pressure, Age is the age of the individual, Weight is     middle‐aged men.
  the weight in pounds, Parents indicates whether the individual's
  parents had high blood pressure: 0 means neither parent has high
  blood pressure, 1 means one parent has high blood pressure, and 2
  means both mother and father have high blood pressure, Med is the
  number of hours per month that the individual meditates, and TypeA
  is a measure of the degree to which the individual exhibits type A
  personality behavior, as determined from a form that the person fills
  out. Systolic is the dependent variable and the other five variables
  are the independent variables


                                                                      89                                       90




                                                                                                                     15
3/22/2010




           Multiple Regression                                                                 Multiple Regression
                                                                                     • MINITAB SOLUTION : Stat > Regression > Regression
• Model

                Y = systolic,
                xl = age,
                x2 = weight,
                x3 = parents,
                x4 = med, and
                x5 = Type A

                                                                                91                                                                92




                                                                                                 Multiple Regression
           Multiple Regression
                                                                                       Checking the Overall Utitity of a Model
• MINITAB SOLUTION : Stat > Regression > Regression
                                                                                     • Purpose: Check whether the model is useful and to
                                                                                       control your α value
                                                                                        Rather than conduct a large group of t-tests on the betas
                                                                                       and increase the probability of making a type 1 error
                                                                                       make one test and know that α= 0.05. The F-test is such a
                                                                                       test. It is contained in the analysis of variance associated
                                                                                       with the analysis. The F-test tests the following
   The five hypothesis tests suggest Weight and Type A should be kept and 
                                                                                       hypothesis associated with the blood pressure model
   the other three variables thrown out.



                                                                                93                                                                94




           Multiple Regression
• MINITAB SOLUTION : Stat > Regression > Regression



                                                Interpretation
                                                As is seen F= 18.50 with a 
                                                p‐value of 0.000 and the null 
                                                                                                               END
                                                hypothesis should be rejected; 
                                                the conclusion is that at least 
                                                one βi ≠ 0. This F‐test says that 
                                                the model is useful in predicting 
                                                systolic blood pressure.




                                                                                95                                                                96




                                                                                                                                                        16

Mais conteúdo relacionado

Mais procurados

Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionpankaj8108
 
Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Ali Lodhra
 
General Quadrature Equation
General Quadrature EquationGeneral Quadrature Equation
General Quadrature EquationMrinalDev1
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applicationsPlusOrMinusZero
 
Benginning Calculus Lecture notes 2 - limits and continuity
Benginning Calculus Lecture notes 2 - limits and continuityBenginning Calculus Lecture notes 2 - limits and continuity
Benginning Calculus Lecture notes 2 - limits and continuitybasyirstar
 
Introduction to graph theory (All chapter)
Introduction to graph theory (All chapter)Introduction to graph theory (All chapter)
Introduction to graph theory (All chapter)sobia1122
 
METHOD OF LEAST SQURE
METHOD OF LEAST SQUREMETHOD OF LEAST SQURE
METHOD OF LEAST SQUREDanial Mirza
 
Bisection theorem proof and convergence analysis
Bisection theorem proof and convergence analysisBisection theorem proof and convergence analysis
Bisection theorem proof and convergence analysisHamza Nawaz
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaEdureka!
 
L19 increasing &amp; decreasing functions
L19 increasing &amp; decreasing functionsL19 increasing &amp; decreasing functions
L19 increasing &amp; decreasing functionsJames Tagara
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference FormulaJas Singh Bhasin
 
Matlab practical and lab session
Matlab practical and lab sessionMatlab practical and lab session
Matlab practical and lab sessionDr. Krishna Mohbey
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian processEric Xihui Lin
 
Regression analysis
Regression analysisRegression analysis
Regression analysisSohag Babu
 
Basic matlab and matrix
Basic matlab and matrixBasic matlab and matrix
Basic matlab and matrixSaidur Rahman
 

Mais procurados (20)

Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Curve fitting
Curve fitting Curve fitting
Curve fitting
 
Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...
 
Linear regression
Linear regressionLinear regression
Linear regression
 
General Quadrature Equation
General Quadrature EquationGeneral Quadrature Equation
General Quadrature Equation
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applications
 
Benginning Calculus Lecture notes 2 - limits and continuity
Benginning Calculus Lecture notes 2 - limits and continuityBenginning Calculus Lecture notes 2 - limits and continuity
Benginning Calculus Lecture notes 2 - limits and continuity
 
Introduction to graph theory (All chapter)
Introduction to graph theory (All chapter)Introduction to graph theory (All chapter)
Introduction to graph theory (All chapter)
 
METHOD OF LEAST SQURE
METHOD OF LEAST SQUREMETHOD OF LEAST SQURE
METHOD OF LEAST SQURE
 
Bisection theorem proof and convergence analysis
Bisection theorem proof and convergence analysisBisection theorem proof and convergence analysis
Bisection theorem proof and convergence analysis
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Regression
RegressionRegression
Regression
 
L19 increasing &amp; decreasing functions
L19 increasing &amp; decreasing functionsL19 increasing &amp; decreasing functions
L19 increasing &amp; decreasing functions
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference Formula
 
Vector space
Vector spaceVector space
Vector space
 
Matlab practical and lab session
Matlab practical and lab sessionMatlab practical and lab session
Matlab practical and lab session
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian process
 
Regression
RegressionRegression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Basic matlab and matrix
Basic matlab and matrixBasic matlab and matrix
Basic matlab and matrix
 

Destaque

Fitting polynomial data
Fitting polynomial dataFitting polynomial data
Fitting polynomial dataBart Lauwers
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regressionnaveedaliabad
 
Statisticsfor businessproject solution
Statisticsfor businessproject solutionStatisticsfor businessproject solution
Statisticsfor businessproject solutionhuynguyenbac
 
The power of RapidMiner, showing the direct marketing demo
The power of RapidMiner, showing the direct marketing demoThe power of RapidMiner, showing the direct marketing demo
The power of RapidMiner, showing the direct marketing demoWessel Luijben
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNJosh Patterson
 
Financialmodeling
FinancialmodelingFinancialmodeling
FinancialmodelingTalal Tahir
 
Qam formulas
Qam formulasQam formulas
Qam formulasAshu Jain
 
Regression
Regression Regression
Regression Ali Raza
 
ForecastIT 2. Linear Regression & Model Statistics
ForecastIT 2. Linear Regression & Model StatisticsForecastIT 2. Linear Regression & Model Statistics
ForecastIT 2. Linear Regression & Model StatisticsDeepThought, Inc.
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)mohamedchaouche
 

Destaque (20)

Fitting polynomial data
Fitting polynomial dataFitting polynomial data
Fitting polynomial data
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Chapter05
Chapter05Chapter05
Chapter05
 
gls
glsgls
gls
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
 
Lorenz Curves
Lorenz CurvesLorenz Curves
Lorenz Curves
 
Statisticsfor businessproject solution
Statisticsfor businessproject solutionStatisticsfor businessproject solution
Statisticsfor businessproject solution
 
The power of RapidMiner, showing the direct marketing demo
The power of RapidMiner, showing the direct marketing demoThe power of RapidMiner, showing the direct marketing demo
The power of RapidMiner, showing the direct marketing demo
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
Financialmodeling
FinancialmodelingFinancialmodeling
Financialmodeling
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Midterm
MidtermMidterm
Midterm
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Qam formulas
Qam formulasQam formulas
Qam formulas
 
Regression
Regression Regression
Regression
 
ForecastIT 2. Linear Regression & Model Statistics
ForecastIT 2. Linear Regression & Model StatisticsForecastIT 2. Linear Regression & Model Statistics
ForecastIT 2. Linear Regression & Model Statistics
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
 
C2.1 intro
C2.1 introC2.1 intro
C2.1 intro
 
Gini coefficient
Gini coefficientGini coefficient
Gini coefficient
 

Semelhante a Chapt 11 & 12 linear & multiple regression minitab

GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Ukraine
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptxssuserb8a904
 
AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009guestbeb22e
 
Johnny Aqm Presentation
Johnny Aqm PresentationJohnny Aqm Presentation
Johnny Aqm Presentationguestbeb22e
 
Teaching the Correlation Coefficient
Teaching the Correlation CoefficientTeaching the Correlation Coefficient
Teaching the Correlation CoefficientJohn Zorich, MS, CQE
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer ApplicationsDrMateoMacalaguingJr
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Naoki Hayashi
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlationShakeel Nouman
 

Semelhante a Chapt 11 & 12 linear & multiple regression minitab (12)

201977 1-1-4-pb
201977 1-1-4-pb201977 1-1-4-pb
201977 1-1-4-pb
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptx
 
AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009AQM Presentation by Johnny Lin on Jan 9, 2009
AQM Presentation by Johnny Lin on Jan 9, 2009
 
Johnny Aqm Presentation
Johnny Aqm PresentationJohnny Aqm Presentation
Johnny Aqm Presentation
 
Teaching the Correlation Coefficient
Teaching the Correlation CoefficientTeaching the Correlation Coefficient
Teaching the Correlation Coefficient
 
Bt24466477
Bt24466477Bt24466477
Bt24466477
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer Applications
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlation
 

Chapt 11 & 12 linear & multiple regression minitab

  • 1. 3/22/2010 IE 609 Chapter 11 The Relation between Simple Linear Regression and Two Sets of Measures Correlation 1 2 The Relation between The Relation between Two Sets of Measures Two Sets of Measures • Construct a scatter diagram for the following • Plot Results data: 3 4 The Relation between The Relation between Two Sets of Measures Two Sets of Measures • You might have reversed the axes so that the • Linear or Straight Line Relationship vertical dimension represented the midterm grade and the horizontal dimension, the final grade. d • When one measure may be used to predict another, it is customary to represent the predictor on the horizontal dimension (the x- axis). 5 6 1
  • 2. 3/22/2010 The Relation between The Relation between Two Sets of Measures Two Sets of Measures • Other relationships • Which of the diagrams represents the stronger relationship? 7 8 The Relation between Two Sets of Measures • Which of the diagrams represents the stronger relationship? Simple Linear Regression y = α + βx y = a + bxi + εi 9 10 Simple Linear Regression Simple Linear Regression Minitab Data Entry Calc ‐> Column Statistics Table 11.1 Pg 393 11 12 2
  • 3. 3/22/2010 Simple Linear Regression Simple Linear Regression Calc ‐> Calculator (Create Formula, Store Variable: Residual 13 14 Simple Linear Regression Simple Linear Regression Graph ‐> Probability Plot Residuals appear Normally Distributed 15 16 Linear Regression Linear Regression and Correlation Simple Structure Simple Structure Question…….. Is the sample mean of Demand the correct value to use for ŷ? yi = ŷ + εi y i = ŷ + εi ŷ → Sample mean = 34.0606 (Minitab*) – Although it might seem to be a trivial question, εi = yi - ŷ (Minitab “Residual”) you might ask why the sample mean (y-bar) was i h k h h l ( b ) the correct value to use for ŷ ? Sample Variance of ŷ = (10.7)2 – Since the purpose of the is to accurately describe the yi then we would expect the model is to * Mean of Demand , y (%) = 34.0606 deliver small errors (that is, εi) but how should we go about making small errors? 17 18 3
  • 4. 3/22/2010 Linear Regression Linear Regression Simple Structure Simple Structure Question…….. Is the sample mean of Demand the correct value to use for ŷ? Σεi2  = Σ(ŷ – y)2 y i = ŷ + εi The calculus operation that delivers this solution is – A logical choice is to pick ŷ, which might be different from the sample mean, so that the error variance s2 calculated with εi = yi - ŷ is minimized. n  i 2 This is called the method of least squares because the method  s2  i 1 minimizes the error sum of squares. n 1 19 20 Linear Regression Linear Regression Simple Structure Simple Structure Now consider the scatter diagram below.  y appears to increase linearly with respect to x y = α + βx • The parameters α and β are the y axis intercept and slope, respectively. • Since we typically have sample data and not the complete population of (x, y) observations, we cannot expect to determine α and β, exactly- they will have to be estimated from the sample data. Our model is of the form There might be an underlying causal relationship between x and y of the form: y = a + bxi + εi y = α + βx 21 22 Linear Regression Linear Regression Simple Structure Simple Structure εi = (yi – ŷ) = yi – (a + bxi) • Then for any choice of a and b the εi may be • Although this equation allows us to calculate the ε, for a determined from given (x, yi) data set once a and b, are specified, there are still an infinite number of a and b, values that could εi = (yi – ŷ) = yi – (a + bxi ) be used in the model. Clearly the choice of a and b, that provides the best fit to the data should make the εi or some function of them small. Although many conditions • These errors or discrepancies εi , are also can be stated to define best fit lines by minimizing the εi , called the model residuals. by far the most frequently used condition to define the best fit line is the one that minimizes Σεi2. 23 24 4
  • 5. 3/22/2010 Linear Regression and Correlation Linear Regression Simple Structure Simple Structure • That is, the best fit line for the (x, y) data, is called the • The error variance for linear least squares regression is linear least squares regression line, which corresponds to given by the choice of a and b, that minimizes Σεi2. • The calculus sol tion to this problem is given by the calc l s solution gi en b simultaneous solution to the two equations: where n is the number of (xi, yi) observations and sε is  n 2  n 2 called the standard error of the model. i  0 a i 1 i  0 b i 1 • The Equation has n- 2 in the denominator because two degrees of freedom are consumed by the calculation of the • The method of fitting a line to (xi, yi) data using the regression coefficients a and b from the experimental data. solution is called linear regression. 25 26 Linear Regression REGRESSION COEFFICIENTS Simple Structure • Think of the error variance sε2 in the regression • With the condition to determine the a and b, problem in the same way as you think of the sample variance s2 used to quantify the amount of variation in values that provide the best fit line for the simple measurement data. (xi, yi) data, namely the minimization of Σεi2, • Whereas the sample variance characterizes the scatter  we proceed to determine a and b in a more of observations about a single value y  y the error variance in the regression problem characterizes the rigorous manner. distribution of values about the line ŷi = a + bxi • Sε2 and s2 are close cousins, they are both measures of the errors associated with different models for different kinds of data. 27 28 REGRESSION COEFFICIENTS • The calculus method that determines the REGRESSION unique values of a and b, that minimize Σεi2 COEFFICIENTS requires that we solve the simultaneous equations: i Determining the unique values of a and b  n 2  n 2 i  0 a i 1 i  0 b i 1 29 30 5
  • 6. 3/22/2010 REGRESSION COEFFICIENTS REGRESSION COEFFICIENTS • From these equations the resulting values of a • SSX, and SSY are just the sums of squares required to determine the variances of the x and b, are best expressed in terms of sums of and y values q squares: SS xy b a  y  bx SS x n SS y   ( yi  y ) 2 n SS x   ( xi  x ) 2 n n SS xy   ( xi  x ) ( yi  y ) SS y   ( yi  y ) 2 SS xy i 1 i 1 b  i 1 n i 1 SS x SS x   ( xi  x ) 2 i 1 31 32 REGRESSION COEFFICIENTS REGRESSION COEFFICIENTS • Similarly, using the sum of squares notation, • Another important implication of Equations we can write the error sum of squares for the regression as SS xy a  y  bx b SS x • and the standard error as: • that the point ( x, y ) fall on the best-fit line. This is just a consequence of the way the sums of squares are calculated 33 34 LINEAR REGRESSION REGRESSION COEFFICIENTS ASSUMPTIONS Stats > Regression > Fitted Line Plot s2=SSE/(n‐2) a  y  bx ŷ= 35 36 6
  • 7. 3/22/2010 LINEAR REGRESSION LINEAR REGRESSION ASSUMPTIONS ASSUMPTIONS • A valid linear regression model requires that five conditions are satisfied: l. The values of x are determined without error. 2. The εi, are normally distributed with mean με= 0 for all values of x. 3 . The distribution of the εi, has constant variance σε2 for all values of x within the range of experimentation (that is, homoscedasticity) 4. The εi are independent of each other. 5. The linear model provides a good fit to the data 37 38 HYPOTHESIS TESTS FOR REGRESSION COEFFICIENTS HYPOTHESIS TESTS FOR • The values of the intercept and slope a and b REGRESSION COEFFICIENTS found with Equations SS xy a  y  bx b SS x are actually estimates for the true parameters β α and β α α β 0 0 Hypothetical distributions for α and β 39 40 HYPOTHESIS TESTS FOR HYPOTHESIS TESTS FOR REGRESSION COEFFICIENTS REGRESSION COEFFICIENTS • Although linear regression analysis will always return a and b values. it's possible that one or both of these values could be statistically insignificant. We require a formal method of testing α and β to see if they are different from zero. Hypotheses for these tests are: H th f th t t β β α α0 0 H0: α0 = 0 H1: α0 ≠ 0 Hypothetical distributions for α and β H0: β 0 = 0 Both of these distributions follow Student's t distribution with H1: β 0 ≠ 0 degrees of freedom equal to the error degrees of freedom. To perform these tests we need some idea of the amount of variability present in the estimates of α and β 41 42 7
  • 8. 3/22/2010 HYPOTHESIS TESTS FOR HYPOTHESIS TESTS FOR REGRESSION COEFFICIENTS REGRESSION COEFFICIENTS • The hypothesis tests can be performed using • Estimates of the variances σα0 and σβ0 are given one-sample t tests with dfε = n -2 degrees of by: freedom with the t statistics. sα2 =  t  s and sβ2 =  t  s Microsoft Equation 43 3.0 44 HYPOTHESIS TESTS FOR HYPOTHESIS TESTS FOR REGRESSION COEFFICIENTS REGRESSION COEFFICIENTS • The (1 -α) 100% confidence intervals for α and • It is very important to realize that the variances of a and b as given are proportional to the standard error of the fit Sε. This means that if β are determined from there are any uncontrolled variables in the experiment that cause the standard error to increase. there will be a corresponding increase in the standard deviations of the regression coefficients. This could P(a - tα/2sa < α < a + tα/2sa ) = 1- α /2 /2 make the regression coefficients disappear into the noise. k th i ffi i t di i t th i • Always keep in mind that the model's ability to predict the regression coefficients is dependent on the size of the standard error. P(b - tα/2sb < β < b+ tα/2sb ) = 1- α Take care to remove or control or account for extraneous variation so that you get the best predictions from your models with the least effort. with n -2 degrees of freedom. Microsoft Equation 3.0 45 46 CONFIDENCE LIMITS FOR THE CONFIDENCE LIMITS FOR THE REGRESSION LINE REGRESSION LINE Stat > Regression > Fitted Line Plot menu.  • The true slope and intercept of a regression line are You will have to select Display Confidence Bands in the Options menu to add the  not exactly known. confidence limits to the fitted line plot.  • The (l – α) 100% confidence interval for the regression line is given by: i li i i b 47 48 8
  • 9. 3/22/2010 PREDICTION LIMITS FOR THE PREDICTION LIMITS FOR THE OBSERVED VALUES OBSERVED VALUES Stat>Regression> Fitted Line Plot menu.  • The prediction interval provides prediction bounds You will have to select Display Prediction Bands in the Options menu for individual observations. The width of the prediction interval combines the uncertainty of the position of the true line as described by the confidence interval with the scatter of points about the line as measured by the standard error. where tα/2 has dfε= n - 2 degrees of freedom 49 50 CORRELATION Coefficient of Determination r2 • A comprehensive statistic is required to measure the fraction of the total variation in the response y that is explained by the regression model. CORRELATION • The total variation in y taken relative to y-bar is given by SSy = Σ(yi – y ) but SSy is partitioned into two terms: one that COEFFICIENT OF DETERMINATION (r2) accounts for the amount of variation explained by the straight line model given by SSregression and another that accounts for CORRELATION COEFFICIENT (r) the unexplained error variation given by . . n SS   ( yi  y ) 2   i 2 i 1 51 52 CORRELATION Coefficient of Determination r2 CORRELATION COEFFICIENT (r) • The three quantities are related by: • The correlation coefficient r is given by the square root of the coefficient of determination r2 with an appropriate plus or minus sign. • Consequently the fraction of SSy explained by the Consequently, • If two measures have a linear relationship, it is possible model is: to describe how strong the relationship is by means of a statistic called a correlation coefficient r. • The symbol for the correlation coefficient is r. • The symbol for the corresponding population parameter is ρ (the Greek letter "rho"). where r2 is called the coefficient of determination. 53 54 9
  • 10. 3/22/2010 CORRELATION COEFFICIENT (r) CORRELATION COEFFICIENT (r) • Pearson product-moment correlation • The basic formulas for the correlation coefficient are 55 56 PEARSONS PRODUCT-MOMENT CORRELATION CORRELATION COEFFICIENT (r) The Coefficient of Determination r2 • Given a set of data. (Example 11.10, pg 435 ) • The coefficient of determination finds numerous applications in regression and multiple regression problems. Find r x y x y 0.414 29186 0.548 67095 • Since SSregression is bounded by 0≤ SSregression ≤SSy there are 0.383 0.399 29266 26215 0.581 0.557 85156 69571 corresponding bounds on the coefficient of determination 0.402 30162 0.55 84160 given by 0 ≤ r2 ≤ 1.1 0.442 38867 0.531 73466 0.422 37831 0.55 78610 • When r2 = 0 the regression model has little value because 0.466 0.5 44576 46097 0.556 0.523 67657 74017 very little of the variation in y is attributable to its 0.514 59698 0.602 87291 dependence on r. When r2 = 1 the regression model almost 0.53 67705 0.569 86836 0.569 66088 0.544 82540 completely explains all of the variation in the response, that 0.558 0.577 78486 89869 0.557 0.53 81699 82096 is, r almost perfectly predicts y. 0.572 0.548 77369 67095 0.547 0.585 75657 80490 • We're usually hoping for r2 = l, but this rarely happens. 57 58 Confidence Interval for the Confidence Interval for the Coefficient of Determination r2 Correlation Coefficient (r) • When the distribution of the regression model residuals • The coefficient of determination r2 is a statistic that is normal with constant variance, the distribution of r is represents the proportion of the total variation in the complicated, but the distribution of: values of the variable Y that can be accounted for or explained by a linear relationship with the random l i d b li l i hi i h h d is appro imatel normal with mean: approximately ith variable X. and standard deviation: • A different data set of (x, y) values will give a different value of r2. The quantity that such r2 values • The transformation of r into Z is called Fisher's Z estimate is the true population coefficient of transformation. determination p2, which is a parameter. 59 60 10
  • 11. 3/22/2010 Confidence Interval for the LINEAR REGRESSION Correlation Coefficient (r) WITH MINITAB • This information can be used to construct a • MINITAB provides two basic functions for confidence interval for the unknown parameter performing linear regression µz from the statistic r and the sample size n. 1. Stat Regression> Fitted Line Plot menu is The confidence interval is: the best place to start to evaluate the q p quality y of the fitted function. Includes a scatter plot of the (x, y,) data with the superimposed fitted line, a full ANOVA table and an abbreviated table of regression coefficients. 61 62 LINEAR REGRESSION LINEAR REGRESSION WITH MINITAB WITH MINITAB Stat>Regression> Regression menu   2. Stat>Regression> Regression menu The first part is a table of the regression coefficients and the corresponding standard deviations, t values, and p values. The second part is the ANOVA table, which summarizes the statistics required to determine the regression coefficients and the summary statistics like r, r2, radj. and sε. t ti ti lik d There is a p-value reported for the slope of the regression line in the table of regression coefficients and another p value reported in the ANOVA table for the ANOVA F test. These two p values are numerically identical and not just by coincidence. There is a special relationship that exists between the t and F distributions when the F distribution has one numerator degree of freedom. 63 64 POLYNOMIAL MODELS • The general form of a polynomial model is: ŷ = a + b1 x + b2x2 + …+bpxp POLYNOMIAL MODELS where the polynomial is said to be of order p. The p y p regression coefficients a, b1, . . . ,bp are determined using ŷ = a + b1 x + b2x2 + …+bpxp the same algorithm that was used for the simple linear model; the error sum of squares is simultaneously minimized with respect to the regression coefficients. The family of equations that must be solved to determine the regression coefficients is nightmarish, but most of the good statistical software packages have this capability. 65 66 11
  • 12. 3/22/2010 POLYNOMIAL MODELS POLYNOMIAL MODELS • Although high-order polynomial models can fit • Because of their complexity, it's important to the (x, y) data very well, they should be of the summarize the performance of polynomial lowest order possible that accurately represents the relationship between y and x. There are no models using r2adjusted instead of r2. In some clear guidelines on what order might be l id li h t d i ht b cases when there are relatively few error necessary, but watch the significance (that is, the degrees of freedom after fitting a large p values) of the various regression coefficients to polynomial model, the r2 value could be confirm that all of the terms are contributing to misleadingly large whereas r2adjusted will be the model. Polynomial models must also be much lower but more representative of the true hierarchical, that is, a model of order p must contain all possible lower-order terms. performance of the model. 67 68 POLYNOMIAL MODELS POLYNOMIAL MODELS • Fit the following data with an appropriate • Solution: model and use scatter plots and residuals scatter plots and  diagnostic plots to check for lack of fit. residuals diagnostic plots 69 70 POLYNOMIAL MODELS POLYNOMIAL MODELS • Solution: SCATTER PLOT • Solution: SCATTER PLOT 71 72 12
  • 13. 3/22/2010 POLYNOMIAL MODELS POLYNOMIAL MODELS • Solution: Residuals diagnostic plots • Solution: Residuals diagnostic plots 73 74 POLYNOMIAL MODELS POLYNOMIAL MODELS • Solution: Residuals diagnostic plots • Solution: Residuals diagnostic plots 75 76 POLYNOMIAL MODELS POLYNOMIAL MODELS • Solution: Quadratic Create x^2  Column • Solution: Quadratic Model 77 78 13
  • 14. 3/22/2010 POLYNOMIAL MODELS • Solution: Quadratic Model Stat > Regression > Fitted Line Plot (x,y) – Quadratic Multiple Regression 79 80 Multiple Regression Multiple Regression • When a response has n quantitative predictors • This equation has the same basic structure as such as y (x1 x2, .. . , xn), the model for y must the polynomial model and, in fact, the two be created by multiple regression. In multiple models are fitted and analyzed in much the regression each predictive term in the model same way. Where the work-sheet to fit the polynomial model requires n columns, one for has its own regression coefficient. The each power of x, the worksheet to fit the simplest multiple regression model contains a multiple regression model requires n columns linear term for each predictor: to account for each of the n predictors. The same regression methods are used to analyze both problems. 81 82 Multiple Regression Multiple Regression • Frequently, the simple linear model in does not • PROBLEM • Selling Price Table (in thousands of dollars) fit the data and a more complex model is A real‐estate executive  would like to be able to  required. The terms that must be added to the predict the cost of a house  model to achieve a good fit might involve in a housing development  in a housing development interactions, quadratic terms, or terms of even on the basis of the number  of bedrooms and bath‐ higher order. Such models have the basic form: rooms in the house.  83 84 14
  • 15. 3/22/2010 Multiple Regression Multiple Regression • The following first-order model is assumed to connect the • MINITAB SOLUTION selling price of the home with the number of bedrooms and the • Stat > Regression > Regression. number of baths. The dependent variable is represented by y and the independent variables are x1,the number of bedrooms, and x2, the number of baths. 85 86 Multiple Regression Multiple Regression • MINITAB Output – Stat > Regression > Regression. 87 88 Multiple Regression Multiple Regression • Problem Blood pressure  The following table contains data from a blood pressure study. The study on fifty  data were collected on a group of middle aged men. Systolic is the systolic blood pressure, Age is the age of the individual, Weight is middle‐aged men. the weight in pounds, Parents indicates whether the individual's parents had high blood pressure: 0 means neither parent has high blood pressure, 1 means one parent has high blood pressure, and 2 means both mother and father have high blood pressure, Med is the number of hours per month that the individual meditates, and TypeA is a measure of the degree to which the individual exhibits type A personality behavior, as determined from a form that the person fills out. Systolic is the dependent variable and the other five variables are the independent variables 89 90 15
  • 16. 3/22/2010 Multiple Regression Multiple Regression • MINITAB SOLUTION : Stat > Regression > Regression • Model Y = systolic, xl = age, x2 = weight, x3 = parents, x4 = med, and x5 = Type A 91 92 Multiple Regression Multiple Regression Checking the Overall Utitity of a Model • MINITAB SOLUTION : Stat > Regression > Regression • Purpose: Check whether the model is useful and to control your α value Rather than conduct a large group of t-tests on the betas and increase the probability of making a type 1 error make one test and know that α= 0.05. The F-test is such a test. It is contained in the analysis of variance associated with the analysis. The F-test tests the following The five hypothesis tests suggest Weight and Type A should be kept and  hypothesis associated with the blood pressure model the other three variables thrown out. 93 94 Multiple Regression • MINITAB SOLUTION : Stat > Regression > Regression Interpretation As is seen F= 18.50 with a  p‐value of 0.000 and the null  END hypothesis should be rejected;  the conclusion is that at least  one βi ≠ 0. This F‐test says that  the model is useful in predicting  systolic blood pressure. 95 96 16