SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
Tyler Anton
1
Spring 2014
Problem Set #3
Hypothesis Testing
1. University of Maryland University College is concerned that out of state students may be
receiving lower grades than Maryland students. Two independent random samples have been
selected: 165 observations from population 1 (Out of state students) and 177 from population 2
(Maryland students). The sample means obtained are X1(bar)=86 and X2(bar)=87. It is known
from previous studies that the population variances are 8.1 and 7.3 respectively. Using a level of
significance of .01, is there evidence that the out of state students may be receiving lower
grades? Fully explain your answer.
H0: 1 > 2
H1: 1 < 2 [Rejection Region in lower (left) tail]
Level of Significance = 0.01 @ one-tailed test (Appendix B.5)
*Critical Value (infinite df) = (-) 2.326; less than = (-) Critical Value via one-tail; Rejection
Region in lower (left) tail
Thus, reject H0 if z < - 2.326
Population Variance = 1^2
Z = (86-87) / SQRT [(8.1/165) + (7.3/177)]
Z = (-1/0.3005558965)
Z = -3.327168129
Explanation
The Z test statistic (-3.327) is lower than the critical value (-2.326) and the one-tail rejection
region is pointing towards the left (lower tail). This implies that we reject H0, and accept H1.
Thus, there is evidence that out-of-state students receive lower grades than Maryland students.
Reject Ho if P-value < Level of significance (0.01)
*P-value = [0.5 – 0.4990] = 0.0010; Thus, reject H0; small likelihood Ho is true
*0.4990 derived from Appendix 3.B; Area under the curve corresponding to 3.327 is 0.4990
Tyler Anton
2
Simple Regression
2. A CEO of a large pharmaceutical company would like to determine if the company should
be placing more money allotted in the budget next year for television advertising of a new drug
marketed for controlling diabetes. He wonders whether there is a strong relationship between the
amount of money spent on television advertising for this new drug called DIB and the number of
orders received. The manufacturing process of this drug is very difficult and requires stability so
the CEO would prefer to generate a stable number of orders. The cost of advertising is always an
important consideration in the phase I roll-out of a new drug. Data that have been collected over
the past 20 months indicate the amount of money spent of television advertising and the number
of orders received.
The use of linear regression is a critical tool for a manager's decision-making ability.
Please carefully read the example below and try to answer the questions in terms of the problem
context. The results are as follows:
NOTE: If you do not have the Data Analysis option under Tools you must install it. You need
to go to Tools select Add-ins and then choose the 2 data toolpak options. It should take about a
minute.
Month Advertising Cost Number of Orders
1 $74,430.00
2,856,000
2 62,620 1,800,000
3 67,580 1,299,000
4 53,680 1,510,000
5 69,180 1,367,000
6 73,140 2,611,000
7 85,370 3,788,000
8 76,880 2,935,000
9 66,990 1,955,000
10 77,230 3,634,000
11 61,380 1,598,000
12 62,750 1,867,000
13 63,270 1,899,000
14 86,190 3,245,000
Tyler Anton
3
15 60,030 1,934,000
16 79,210 2,761,000
17 67,770 1,625,000
18 84,530 3,778,000
19 79,760 2,979,000
20 84,640 3,814,000
a. Set up a scatter diagram and calculate the associated correlation
coefficient. Discuss how strong you think the relationship is between the
amount of money spent on television advertising and the number of orders
received.
Please use the Correlation procedures within Excel under Tools > Data Analysis.
Implication: The number of orders received is related to the advertising costs/budget.
Dependent Variable = [Number of Orders]
Independent Variable = [Advertising Costs]
y = 0.0097x + 47895
R² = 0.776
$0
$20,000
$40,000
$60,000
$80,000
$100,000
1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000
AdvertisingCosts(y)
Orders Received (x)
Advertising Cost & Orders Received Comparison
Tyler Anton
4
Correlation Coefficient (r) 0.880931435
The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive
correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two
variables – advertising costs and number of orders. As advertising costs increase, the number of
orders received will follow. A positive correlation exists. So far, the CEO should consider
increasing the advertising budget. There is a relatively direct or strong relationship between the
amount of money spent on television advertising for this new drug, called DIB, and the number
of orders received.
b. Assuming there is a statistically significant relationship, use the least squares method to
find the regression equation to predict the advertising costs based on the number of orders
received. Please use the regression procedure within Excel under Tools > Data Analysis to
construct this equation.
Least Squares Regression Equation: y = 0.00971950x + 47895
R2
= 0.776
c. Interpret the meaning of the slope, b1, in the regression equation.
The coefficient for the ‘Number of Orders Received’ (x) is 0.00971950. For every increase in the
firm’s ‘Number of Orders Received’, there is an anticipated 0.00971950 increase in ‘Advertising
Costs’ respectively - (Just under 1 cent)
B. Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.880931435
R Square 0.776040194
Adjusted R Square0.763597982
Standard Error4704.512237
Observations 20
ANOVA
df SS MS F Significance F
Regression 1 1380434618 1380434618 62.3715644 2.943E-07
Residual 18 398383837 22132435.39
Total 19 1778818455
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 47894.77763 3208.26531 14.92855891 1.3962E-11 41154.4623 54635.0929 41154.4623 54635.0929
X Variable 1 0.00971951 0.0012307 7.897566989 2.943E-07 0.00713391 0.01230511 0.00713391 0.01230511
Note that R Squared here is the same (.776) as we got on the chart.
Also the equation coefficients are identical (47895 and .00971)
Tyler Anton
5
d. Predict the monthly advertising cost when the number of orders is 2,300,000. (Hint: Be very
careful with assigning the dependent variable for this problem)
y = dependent variable being estimated. In part d, Advertising Costs are forecasted; hence,
Advertising Costs are the dependent variable.
y = 0.00971950x + 47895
y (Advertising Costs) = 0.00971950(2300000) + 47895
Monthly Advertising Cost (When x = 2,300,000 orders): $70,250
e. Compute the coefficient of determination, r2
, and interpret its meaning.
R2
= 0.776 = % of Total variation (SS Total) explained by the regression equation (SSR)
77.6% of the total variation in Advertising Costs (y) is explained by the number of orders
received (x). Thus, the data is scattered around the best least squares regression line and there
will be error in the predictions – actual vs. predicted (y)’s.
22.4% of the total variation in the dependent variable is error/residual (Unexplained)
variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear
regression line.
f. Compute the standard error of estimate, and interpret its meaning.
Sy.x = standard error for y (advertising costs – depend.) for a given value of x (number of orders).
Sy.x OR STEYX = 4704.51; or [4704.51/1000] = 4.70451 {Simplified}
The standard error of a predicted y-value for each x in the regression is 4.70451
(simplified). This implies the standard error for our forecasted monthly advertising costs is
4.70451.
The predicted dependent variable is located at an x-value corresponding to the regression
line; however, an actual data point may be above or below that line.
Standard error of estimate (SEE): A measure of how inaccurate an estimate might be. It is
essentially the standard deviation or dispersion of actual (y)’s from the predicted (y)’s on
the linear regression line. This is a measure of how well regression line represents the scattered
data. The SEE is the standard deviation of the errors (or residuals). More simply put, the
difference between the actual (y) and the predicted (y) is the error or residual.
The greater the dispersion, the larger the SEE. A larger sample size could be used to
reduce the SEE.
Tyler Anton
6
scatter/dispersion of the observed values around the line of regression for a given value of (x)
g. Do you think that the company should use these results from the regression to base any
corporate decisions on?….explain fully.
Yes.
SEE & r2
are the best measures to evaluate the predictive ability of the regression equation.
The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive
correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two
variables – advertising costs and number of orders. This (r) indicates that there is a very strong
predictive model.
As for r2
, 77.6% of the variation in Advertising Costs (y) is explained by the number of orders
received (x). However, 22.4% of the total variation in the dependent variable is error/residual
(unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s
on the linear regression line.
The standard error of a predicted y-value for each x in the regression is 4.70451
(simplified). This implies the standard error for our forecasted monthly advertising costs is
4.70451 – quite small considering the following:
The correlation coefficient is large (0.8809) since the scattered points tend to be close to the
linear regression line. The correlation coefficient and SEE are inversely related. Thus, as
the strength of the linear relationship between the 2 variables increases, the SEE decreases.
Due to high correlation between the independent and dependent variables, there is less
erratic scatter/dispersion - indicating the regression equation is sufficient and accounts for
over 2/3rds of total variation. A larger sample size, however, such as 3 or 4 years of data,
could be used to reduce this SEE.
This regression model can be used to predict future values with great certainty; high
degree of statistical significance.
Tyler Anton
7
Hypothesis Testing on Multiple Populations
3. Dr. Michaella Evans, a statistics professor at the University of Maryland University College,
drives from her home to the school every weekday. She has three options to drive there. She can
take the Beltway, or she can take a main highway with some traffic lights, or she can take the
back road, which has no traffic lights but is a longer distance. Being as data-oriented as she is,
she is interested to know if there is a difference in the time it takes to drive each route.
As an experiment she randomly selected the route on 21 different days and wrote down the time
it took her for the round trip, getting to work in the morning and back home in the evening.
At the .01 significance level, can she conclude that there is a difference between the driving
times using the different routes?
Time (in minutes) it took to get to work and back using:
Beltway
Main highway Back road
88 79 86
94 86 78
91 75 79
88 83 96
98 74 97
84 72 73
90 68
77
You can check your critical value with the following table:
http://www.statsoft.com/textbook/distribution-tables
Pg 391 & 751
H0: 1=2=3
H1: The mean scores are not equal
Level of Significance = 0.01
Test Statistic = F distribution
df in numerator = (k-1) or 3-1 = 2
df in denominator = (n-k) or 21-3 = 18
Appendix B.6 @ 0.01 F dist = 6.013 (intersection value); Reject H0 if computed F>6.013
Reject Ho if P-value < Level of significance (0.01)
Reject Ho if F > 6.0129
According to the Anova data analysis below, F<6.013 and P-value (0.071) > Level of
significance (0.01). Thus, we reject H1 and conclude that there is NOT a difference between the
driving times using the different routes. This P-value indicates that there is a high probability that
if we rejected H0, we would have committed a type 1 error.
Tyler Anton
8
Since 3.0683<6.0129 we can conclude that the null hypothesis Ho should not be rejected. There
is enough evidence to conclude that there is no difference in the driving times between the three
routes
Anova: Single Factor (Single Driver,
not multiple like in Two-Factor W/O
Replication on pg 402)
SUMMARY
Groups Count Sum Average Variance
Beltway 8 710 88.75 40.21429
Main highway 6 469 78.16667 30.16667
Back road 7 577 82.42857 122.9524
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 398.9047619 2 199.4524 3.068373 0.071341785 6.012905
Within Groups 1170.047619 18 65.00265
Total 1568.952381 20

Mais conteúdo relacionado

Mais procurados

Hypothesis Testing-Z-Test
Hypothesis Testing-Z-TestHypothesis Testing-Z-Test
Hypothesis Testing-Z-Test
Roger Binschus
 
14 ch ken black solution
14 ch ken black solution14 ch ken black solution
14 ch ken black solution
Krunal Shah
 

Mais procurados (15)

Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05
 
Kxu stat-anderson-ch02
Kxu stat-anderson-ch02Kxu stat-anderson-ch02
Kxu stat-anderson-ch02
 
Slides for ch05
Slides for ch05Slides for ch05
Slides for ch05
 
Percentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICSPercentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICS
 
Bbs11 ppt ch07
Bbs11 ppt ch07Bbs11 ppt ch07
Bbs11 ppt ch07
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Chapter9
Chapter9Chapter9
Chapter9
 
Hypothesis Testing-Z-Test
Hypothesis Testing-Z-TestHypothesis Testing-Z-Test
Hypothesis Testing-Z-Test
 
14 ch ken black solution
14 ch ken black solution14 ch ken black solution
14 ch ken black solution
 
Biomath12
Biomath12Biomath12
Biomath12
 
Math 533 ( applied managerial statistics ) entire course
Math 533 ( applied managerial statistics ) entire courseMath 533 ( applied managerial statistics ) entire course
Math 533 ( applied managerial statistics ) entire course
 
Chap05 discrete probability distributions
Chap05 discrete probability distributionsChap05 discrete probability distributions
Chap05 discrete probability distributions
 

Destaque

Instrumento de evaluación para la producción de textos escritos
Instrumento de evaluación para la producción de textos escritosInstrumento de evaluación para la producción de textos escritos
Instrumento de evaluación para la producción de textos escritos
Maritza Vega
 
2010 NEO Econ Study(1)
2010 NEO Econ Study(1)2010 NEO Econ Study(1)
2010 NEO Econ Study(1)
John Kostak
 
Informe de desarrollo y evalución de la estrategia didáctica
Informe de desarrollo y evalución de la estrategia didácticaInforme de desarrollo y evalución de la estrategia didáctica
Informe de desarrollo y evalución de la estrategia didáctica
Maritza Vega
 
EVA & MVA Analysis 2
EVA & MVA Analysis 2EVA & MVA Analysis 2
EVA & MVA Analysis 2
Tyler Anton
 

Destaque (16)

Instrumento de evaluación para la producción de textos escritos
Instrumento de evaluación para la producción de textos escritosInstrumento de evaluación para la producción de textos escritos
Instrumento de evaluación para la producción de textos escritos
 
2010 NEO Econ Study(1)
2010 NEO Econ Study(1)2010 NEO Econ Study(1)
2010 NEO Econ Study(1)
 
Arquitectura contemporanea
Arquitectura contemporaneaArquitectura contemporanea
Arquitectura contemporanea
 
Historia arquitectura religiosa
Historia arquitectura religiosaHistoria arquitectura religiosa
Historia arquitectura religiosa
 
Informe de desarrollo y evalución de la estrategia didáctica
Informe de desarrollo y evalución de la estrategia didácticaInforme de desarrollo y evalución de la estrategia didáctica
Informe de desarrollo y evalución de la estrategia didáctica
 
Acadia student intro
Acadia student introAcadia student intro
Acadia student intro
 
Man138048
Man138048Man138048
Man138048
 
Progettare antifurto a norme Cei 79 3 Diakron
Progettare antifurto a norme Cei 79 3 DiakronProgettare antifurto a norme Cei 79 3 Diakron
Progettare antifurto a norme Cei 79 3 Diakron
 
Отдых на Байкале с детьми
Отдых на Байкале с детьмиОтдых на Байкале с детьми
Отдых на Байкале с детьми
 
Vinos y uvas
Vinos y uvasVinos y uvas
Vinos y uvas
 
Урок – проект «Як зберегти ялинку перед новорічними святами».
Урок – проект «Як зберегти ялинку перед новорічними святами». Урок – проект «Як зберегти ялинку перед новорічними святами».
Урок – проект «Як зберегти ялинку перед новорічними святами».
 
Plantas industriales
Plantas industrialesPlantas industriales
Plantas industriales
 
EVA & MVA Analysis 2
EVA & MVA Analysis 2EVA & MVA Analysis 2
EVA & MVA Analysis 2
 
CLASE 25
CLASE 25CLASE 25
CLASE 25
 
День захисника Вітчизни
День захисника ВітчизниДень захисника Вітчизни
День захисника Вітчизни
 
Вчи і поважай правила дорожнього руху
Вчи і поважай правила дорожнього рухуВчи і поважай правила дорожнього руху
Вчи і поважай правила дорожнього руху
 

Semelhante a Stat_AMBA_600_Problem Set3

Project Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docxProject Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docx
wkyra78
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering Notes
Felipe Affonso
 
Journal ArticleSales and Dealership Size as a Pred.docx
Journal ArticleSales and Dealership Size as a Pred.docxJournal ArticleSales and Dealership Size as a Pred.docx
Journal ArticleSales and Dealership Size as a Pred.docx
croysierkathey
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Beyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROIBeyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROI
nkaf61
 
File 498 Doc 27 03dm Exploratorydataanalysis
File 498 Doc 27 03dm ExploratorydataanalysisFile 498 Doc 27 03dm Exploratorydataanalysis
File 498 Doc 27 03dm Exploratorydataanalysis
mupa
 
Simple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docxSimple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docx
budabrooks46239
 

Semelhante a Stat_AMBA_600_Problem Set3 (20)

Project Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docxProject Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docx
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering Notes
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
Journal ArticleSales and Dealership Size as a Pred.docx
Journal ArticleSales and Dealership Size as a Pred.docxJournal ArticleSales and Dealership Size as a Pred.docx
Journal ArticleSales and Dealership Size as a Pred.docx
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Beyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROIBeyond Classification and Ranking: Constrained Optimization of the ROI
Beyond Classification and Ranking: Constrained Optimization of the ROI
 
A marketing study on Warid and its Ad performance
A marketing study on Warid and its Ad performanceA marketing study on Warid and its Ad performance
A marketing study on Warid and its Ad performance
 
Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Demand estimation and forecasting
Demand estimation and forecastingDemand estimation and forecasting
Demand estimation and forecasting
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Tutorial 8 Solutions.docx
Tutorial 8 Solutions.docxTutorial 8 Solutions.docx
Tutorial 8 Solutions.docx
 
Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)
 
Statistics homework help
Statistics homework helpStatistics homework help
Statistics homework help
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
File 498 Doc 27 03dm Exploratorydataanalysis
File 498 Doc 27 03dm ExploratorydataanalysisFile 498 Doc 27 03dm Exploratorydataanalysis
File 498 Doc 27 03dm Exploratorydataanalysis
 
Bbs11 ppt ch14
Bbs11 ppt ch14Bbs11 ppt ch14
Bbs11 ppt ch14
 
Simple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docxSimple Regression Years with Midwest and Shelf Space Winter .docx
Simple Regression Years with Midwest and Shelf Space Winter .docx
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - Poster
 
Some study materials
Some study materialsSome study materials
Some study materials
 
Friedman-SPSS.docx
Friedman-SPSS.docxFriedman-SPSS.docx
Friedman-SPSS.docx
 

Mais de Tyler Anton

Amazon Case Study - (Tyler Anton)
Amazon Case Study - (Tyler Anton)Amazon Case Study - (Tyler Anton)
Amazon Case Study - (Tyler Anton)
Tyler Anton
 
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
Tyler Anton
 
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Tyler Anton
 
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
Tyler Anton
 
MSFT & ORCL Analysis
MSFT & ORCL AnalysisMSFT & ORCL Analysis
MSFT & ORCL Analysis
Tyler Anton
 

Mais de Tyler Anton (6)

WK 8 DA
WK 8 DAWK 8 DA
WK 8 DA
 
Amazon Case Study - (Tyler Anton)
Amazon Case Study - (Tyler Anton)Amazon Case Study - (Tyler Anton)
Amazon Case Study - (Tyler Anton)
 
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
GLOBAL INVESTMENT CASE GIBSON CO (Tyler Anton)
 
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
Huawei Technologies Ltd_Case Study Analysis (Tyler Anton)
 
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
Grades_AMBA 650 9040 Marketing Management and Innovation (2161)
 
MSFT & ORCL Analysis
MSFT & ORCL AnalysisMSFT & ORCL Analysis
MSFT & ORCL Analysis
 

Stat_AMBA_600_Problem Set3

  • 1. Tyler Anton 1 Spring 2014 Problem Set #3 Hypothesis Testing 1. University of Maryland University College is concerned that out of state students may be receiving lower grades than Maryland students. Two independent random samples have been selected: 165 observations from population 1 (Out of state students) and 177 from population 2 (Maryland students). The sample means obtained are X1(bar)=86 and X2(bar)=87. It is known from previous studies that the population variances are 8.1 and 7.3 respectively. Using a level of significance of .01, is there evidence that the out of state students may be receiving lower grades? Fully explain your answer. H0: 1 > 2 H1: 1 < 2 [Rejection Region in lower (left) tail] Level of Significance = 0.01 @ one-tailed test (Appendix B.5) *Critical Value (infinite df) = (-) 2.326; less than = (-) Critical Value via one-tail; Rejection Region in lower (left) tail Thus, reject H0 if z < - 2.326 Population Variance = 1^2 Z = (86-87) / SQRT [(8.1/165) + (7.3/177)] Z = (-1/0.3005558965) Z = -3.327168129 Explanation The Z test statistic (-3.327) is lower than the critical value (-2.326) and the one-tail rejection region is pointing towards the left (lower tail). This implies that we reject H0, and accept H1. Thus, there is evidence that out-of-state students receive lower grades than Maryland students. Reject Ho if P-value < Level of significance (0.01) *P-value = [0.5 – 0.4990] = 0.0010; Thus, reject H0; small likelihood Ho is true *0.4990 derived from Appendix 3.B; Area under the curve corresponding to 3.327 is 0.4990
  • 2. Tyler Anton 2 Simple Regression 2. A CEO of a large pharmaceutical company would like to determine if the company should be placing more money allotted in the budget next year for television advertising of a new drug marketed for controlling diabetes. He wonders whether there is a strong relationship between the amount of money spent on television advertising for this new drug called DIB and the number of orders received. The manufacturing process of this drug is very difficult and requires stability so the CEO would prefer to generate a stable number of orders. The cost of advertising is always an important consideration in the phase I roll-out of a new drug. Data that have been collected over the past 20 months indicate the amount of money spent of television advertising and the number of orders received. The use of linear regression is a critical tool for a manager's decision-making ability. Please carefully read the example below and try to answer the questions in terms of the problem context. The results are as follows: NOTE: If you do not have the Data Analysis option under Tools you must install it. You need to go to Tools select Add-ins and then choose the 2 data toolpak options. It should take about a minute. Month Advertising Cost Number of Orders 1 $74,430.00 2,856,000 2 62,620 1,800,000 3 67,580 1,299,000 4 53,680 1,510,000 5 69,180 1,367,000 6 73,140 2,611,000 7 85,370 3,788,000 8 76,880 2,935,000 9 66,990 1,955,000 10 77,230 3,634,000 11 61,380 1,598,000 12 62,750 1,867,000 13 63,270 1,899,000 14 86,190 3,245,000
  • 3. Tyler Anton 3 15 60,030 1,934,000 16 79,210 2,761,000 17 67,770 1,625,000 18 84,530 3,778,000 19 79,760 2,979,000 20 84,640 3,814,000 a. Set up a scatter diagram and calculate the associated correlation coefficient. Discuss how strong you think the relationship is between the amount of money spent on television advertising and the number of orders received. Please use the Correlation procedures within Excel under Tools > Data Analysis. Implication: The number of orders received is related to the advertising costs/budget. Dependent Variable = [Number of Orders] Independent Variable = [Advertising Costs] y = 0.0097x + 47895 R² = 0.776 $0 $20,000 $40,000 $60,000 $80,000 $100,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 AdvertisingCosts(y) Orders Received (x) Advertising Cost & Orders Received Comparison
  • 4. Tyler Anton 4 Correlation Coefficient (r) 0.880931435 The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two variables – advertising costs and number of orders. As advertising costs increase, the number of orders received will follow. A positive correlation exists. So far, the CEO should consider increasing the advertising budget. There is a relatively direct or strong relationship between the amount of money spent on television advertising for this new drug, called DIB, and the number of orders received. b. Assuming there is a statistically significant relationship, use the least squares method to find the regression equation to predict the advertising costs based on the number of orders received. Please use the regression procedure within Excel under Tools > Data Analysis to construct this equation. Least Squares Regression Equation: y = 0.00971950x + 47895 R2 = 0.776 c. Interpret the meaning of the slope, b1, in the regression equation. The coefficient for the ‘Number of Orders Received’ (x) is 0.00971950. For every increase in the firm’s ‘Number of Orders Received’, there is an anticipated 0.00971950 increase in ‘Advertising Costs’ respectively - (Just under 1 cent) B. Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.880931435 R Square 0.776040194 Adjusted R Square0.763597982 Standard Error4704.512237 Observations 20 ANOVA df SS MS F Significance F Regression 1 1380434618 1380434618 62.3715644 2.943E-07 Residual 18 398383837 22132435.39 Total 19 1778818455 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 47894.77763 3208.26531 14.92855891 1.3962E-11 41154.4623 54635.0929 41154.4623 54635.0929 X Variable 1 0.00971951 0.0012307 7.897566989 2.943E-07 0.00713391 0.01230511 0.00713391 0.01230511 Note that R Squared here is the same (.776) as we got on the chart. Also the equation coefficients are identical (47895 and .00971)
  • 5. Tyler Anton 5 d. Predict the monthly advertising cost when the number of orders is 2,300,000. (Hint: Be very careful with assigning the dependent variable for this problem) y = dependent variable being estimated. In part d, Advertising Costs are forecasted; hence, Advertising Costs are the dependent variable. y = 0.00971950x + 47895 y (Advertising Costs) = 0.00971950(2300000) + 47895 Monthly Advertising Cost (When x = 2,300,000 orders): $70,250 e. Compute the coefficient of determination, r2 , and interpret its meaning. R2 = 0.776 = % of Total variation (SS Total) explained by the regression equation (SSR) 77.6% of the total variation in Advertising Costs (y) is explained by the number of orders received (x). Thus, the data is scattered around the best least squares regression line and there will be error in the predictions – actual vs. predicted (y)’s. 22.4% of the total variation in the dependent variable is error/residual (Unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. f. Compute the standard error of estimate, and interpret its meaning. Sy.x = standard error for y (advertising costs – depend.) for a given value of x (number of orders). Sy.x OR STEYX = 4704.51; or [4704.51/1000] = 4.70451 {Simplified} The standard error of a predicted y-value for each x in the regression is 4.70451 (simplified). This implies the standard error for our forecasted monthly advertising costs is 4.70451. The predicted dependent variable is located at an x-value corresponding to the regression line; however, an actual data point may be above or below that line. Standard error of estimate (SEE): A measure of how inaccurate an estimate might be. It is essentially the standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. This is a measure of how well regression line represents the scattered data. The SEE is the standard deviation of the errors (or residuals). More simply put, the difference between the actual (y) and the predicted (y) is the error or residual. The greater the dispersion, the larger the SEE. A larger sample size could be used to reduce the SEE.
  • 6. Tyler Anton 6 scatter/dispersion of the observed values around the line of regression for a given value of (x) g. Do you think that the company should use these results from the regression to base any corporate decisions on?….explain fully. Yes. SEE & r2 are the best measures to evaluate the predictive ability of the regression equation. The scatter plot and correlation coefficient (r) of 0.8809 indicates that there is a strong positive correlation. A value of (r) near 1 indicates a direct or positive linear relationship between the two variables – advertising costs and number of orders. This (r) indicates that there is a very strong predictive model. As for r2 , 77.6% of the variation in Advertising Costs (y) is explained by the number of orders received (x). However, 22.4% of the total variation in the dependent variable is error/residual (unexplained) variation - standard deviation or dispersion of actual (y)’s from the predicted (y)’s on the linear regression line. The standard error of a predicted y-value for each x in the regression is 4.70451 (simplified). This implies the standard error for our forecasted monthly advertising costs is 4.70451 – quite small considering the following: The correlation coefficient is large (0.8809) since the scattered points tend to be close to the linear regression line. The correlation coefficient and SEE are inversely related. Thus, as the strength of the linear relationship between the 2 variables increases, the SEE decreases. Due to high correlation between the independent and dependent variables, there is less erratic scatter/dispersion - indicating the regression equation is sufficient and accounts for over 2/3rds of total variation. A larger sample size, however, such as 3 or 4 years of data, could be used to reduce this SEE. This regression model can be used to predict future values with great certainty; high degree of statistical significance.
  • 7. Tyler Anton 7 Hypothesis Testing on Multiple Populations 3. Dr. Michaella Evans, a statistics professor at the University of Maryland University College, drives from her home to the school every weekday. She has three options to drive there. She can take the Beltway, or she can take a main highway with some traffic lights, or she can take the back road, which has no traffic lights but is a longer distance. Being as data-oriented as she is, she is interested to know if there is a difference in the time it takes to drive each route. As an experiment she randomly selected the route on 21 different days and wrote down the time it took her for the round trip, getting to work in the morning and back home in the evening. At the .01 significance level, can she conclude that there is a difference between the driving times using the different routes? Time (in minutes) it took to get to work and back using: Beltway Main highway Back road 88 79 86 94 86 78 91 75 79 88 83 96 98 74 97 84 72 73 90 68 77 You can check your critical value with the following table: http://www.statsoft.com/textbook/distribution-tables Pg 391 & 751 H0: 1=2=3 H1: The mean scores are not equal Level of Significance = 0.01 Test Statistic = F distribution df in numerator = (k-1) or 3-1 = 2 df in denominator = (n-k) or 21-3 = 18 Appendix B.6 @ 0.01 F dist = 6.013 (intersection value); Reject H0 if computed F>6.013 Reject Ho if P-value < Level of significance (0.01) Reject Ho if F > 6.0129 According to the Anova data analysis below, F<6.013 and P-value (0.071) > Level of significance (0.01). Thus, we reject H1 and conclude that there is NOT a difference between the driving times using the different routes. This P-value indicates that there is a high probability that if we rejected H0, we would have committed a type 1 error.
  • 8. Tyler Anton 8 Since 3.0683<6.0129 we can conclude that the null hypothesis Ho should not be rejected. There is enough evidence to conclude that there is no difference in the driving times between the three routes Anova: Single Factor (Single Driver, not multiple like in Two-Factor W/O Replication on pg 402) SUMMARY Groups Count Sum Average Variance Beltway 8 710 88.75 40.21429 Main highway 6 469 78.16667 30.16667 Back road 7 577 82.42857 122.9524 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 398.9047619 2 199.4524 3.068373 0.071341785 6.012905 Within Groups 1170.047619 18 65.00265 Total 1568.952381 20