SlideShare uma empresa Scribd logo
1 de 81
APPROPRIATE
STATISTICAL TOOLS IN
SOCIAL RESEARCH
Advanced Statistics
The course deals with parametric and non-parametric
statistics. It covers the topics on test of association such as
Spearman rho, Phi coefficient, contingency coefficient, biserial;
testing of hypotheses about two independent groups such as
Two independent samples-test Mann-Whitney U, Wilcoxon W;
testing of hypotheses about three or more independent groups
such as one-way ANOVA, Kruskal-Wallis, Jonkheere-Terpstra
test, and testing of hypotheses about repeated measures like
Paired T-test, Sign Test and Chi-square Test of association and
the statistical power analysis. It includes applications and data
analysis with computations carried out using SPSS.
Objectives of the Course
 Familiarize the students with fundamental topics in Statistics such as
descriptive statistics, inferential statistics, parametric statistics, and non-
parametric statistics
 For the students to be able to analyze and interpret sets of
measurements / data by applying any of the topics included in the
course outline.
COURSE OUTLINE
1. Measurements
2. Sampling
3. Summation Notation
4. Frequency Distribution Table
5. Measure of Central Location
6. One-Sample Tests
7. Two-Sample Tests
8. More than Two-Sample Tests
9. Regression and Correlation
10. Chi-Square Test
Statistics
Statistics (as a discipline) is the scientific method of collecting,
organizing, summarizing, presenting and analyzing data for the
purpose of drawing (valid) conclusion(s) and making (reasonable)
recommendations
Statistics is SCIENCE (systematic involving procedures) and ARTS
(how to use e.g. researches)
statistics -- mass of data (as long as data is there, Statistics is there)
There are three kinds of
lies ….
1. LIES!
2. DAMNED LIES!!
and
3. STATISTICS!!!
…. Benjamin Disraeli
Data Gathering
 Objective method. Data are gathered by measurement or
direct observation (e.g. measuring the weight of 1000 heads
of cabbage. These data are classified as primary data
 Subjective method. Data are provided by respondents (e.g.
data on the amount of rice harvest provided by all farmers in
Nueva Vizcaya. These data are classified as secondary data
 Use of existing records. Data are gathered from previously
collected information by some persons or institutions (e.g.
rice yield records in Nueva Vizcaya for the past 20 years
obtained from the Bureau of Agricultural Statistics). These
data are classified as secondary data
Two Phases of Statistics
1. Descriptive statistics. Deals with the methods of collecting,
organizing, summarizing, and presenting and their interpretation.
2. Inferential statistics. Concerned with making generalizations about
a larger set of data where only a part of it (sample) is examined
o Estimation. The objective of estimation is to come up with a
value or a range of values, computed from the sample, and will
be inferred to as the characteristics of the population where the
sample is taken
o Hypothesis Testing. A statistical procedure for testing whether to
accept or reject a hypothesis (about population characteristics)
on the basis of a sample
Levels of Measurements
Measurement is the process of assigning numbers to observations in
such a way that the numbers are amenable to analysis by manipulation
or operation according to certain rules. There are four levels of
measurements: nominal, ordinal, interval, and ratio and each level
determines the appropriate statistical tool or procedure that can be
applied to the set of data in that particular level of measurement (Table
1).
 Nominal level. Values are simple labels or categories or
names without implied ordering or hierarchy in the
labels (e.g. Tax Identification Number or TIN, gender,
civil status, race or color)
 Ordinal level. Values are simply labels with an implied
ordering or hierarchy in the labels. The distance
between two labels, however, is unknown (e.g. sizes of
shirts, job hierarchy, income levels)
 Interval level. Values can be ordered or arranged
according to magnitude or hierarchy; distance between
two values is known; can add / subtract but cannot
multiply / divide; the zero point is arbitrary (e.g.
Intelligence Quotient or IQ, Temperature in 0F or 0C)
 Ratio level. Values have all the properties of the interval
level. In addition, values can be multiplied or divided and
the zero point is fixed (e.g. age, height, area, mass,
length)
Table 1. Four levels of measurements and the statistical tools appropriate for each level
LEVEL DEFINING RELATIONS
EXAMPLE OF
APPROPRIATE STATISTICS
APPROPRIATE
STATISTICAL TEST
Nominal • Equivalence • Mode
• Frequency
• Contingency coefficient
• Non-parametric
Ordinal • Equivalence
• Hierarchy / order
• Median
• Percentile
• Spearman 
• Kendall 
• Kendall 
• Non-parametric
Interval • Equivalence
• Hierarchy / order
• Known ratio of two
intervals
• Mean
• Standard deviation
• Pearson r
• Multiple R
• Non-parametric
• Parametric
Ratio • Equivalence
• Hierarchy / order
• Known ratio of two
intervals
• Known ratio of two
scale values
• Mean
• Standard deviation
• Pearson r
• Multiple R
• Geometric mean
• Coefficient of variation
• Non-parametric
• Parametric
Universe, Population, Sample and Variables
A researcher would like to study the characteristics of poor
households in the province of Nueva Vizcaya as of December 31, 2022.
Included in the study are the following:
 measurements on annual households income
 household head's highest educational attainment
 employment status (employed, unemployed) of the
household head, and
 household size
Because of time constraints to complete the study, the
researcher obtained measurements on 50 randomly selected poor
households in the province.
Universe  All poor households in the province of Nueva
Vizcaya as of December 31, 2022
Variables  Annual households' income
 Highest educational attainment of household heads
 Employment status of household heads
 Household size
Population  The population for each variable is as follows:
• Annual households' income - poor households
• Highest educational attainment - poor household
heads
• Employment status - poor household heads
• Household size - poor households
Sample  The 50 randomly selected poor households in the
province of Nueva Vizcaya
Definition / specification of terms
Sampling  Inferential statistics involves the process of drawing
out inferences or generalization from the sample
which is the basis in the formulation of conclusions
about the population. The accuracy of these inferences
/ conclusions depends to a large extent upon the
representativeness of the sample. A representative
sample exhibits most, if not all the properties of the
population or, in other words, a representative
sample is a miniature of the population. When
drawing a sample from a population, the two basic
questions that must be addressed are:
• What is the size of the sample?, and
• How is each member of the sample selected?
Sample size  The reasons why a sample is considered in any research undertaking are
the following: (a) to save resources (4Ms – man, materials, machineries
and money; time; and effort), (b) smaller volume of data to deal with,
thus making analysis and interpretation easier, and (c) to overcome the
problem of dealing with members of the population which are
inaccessible.
 In determining the sample size, the following must be considered:
• The bigger the size of the population is, the bigger is the size of the
sample
• Margin of error (sampling error), is the percentage of error incurred
in selecting a sample that is not representative of the population. In
random sampling, choosing a representative sample is attributed to
chance probability. The probability of NOT selecting a
representative sample is known as the margin of error. The lesser the
margin of error is allowed, the more members of the population
should be selected, and the larger the sample size. (In fact, if one does
not allow a margin of error, the whole population should be used).
The sample size is determined using the Slovin’s formula,
n =
N
1+ Ne2
where n is the sample size, N is the population size, and e is
the desired margin of error (decimal)
Sample Size
n =
N
=
5000
= 370
1+ Ne2 1+ 5000(0.05)2
n =
N
=
5000
= 3333
1+ Ne2 1+ 5000(0.01)2
Example 1. With a margin of error of 5%, what is the sample
size, n for a population size of N=5000?
If e = 1%, then
Example 2. During the second semester of SY 2021-2022, the
distribution of enrolment of the College of
Engineering is as follows:
COURSE
GENDER
TOTAL
MALE FEMALE
BSAE 94 54 148
BSCE 79 27 106
TOTAL 173 81 254
Using a 5% margin of error, draw out a sample size employing
Proportional Stratified Random Sampling
Solution:
1. If the grand total (population size) and subtotals are not
given, compute each.
2. Compute the sample size, n with the desired margin of
error.
n =
N
=
254
= 155
1+ Ne2 1+ 254(0.05)2
3. Compute the number for each subgroup.
(a) Male-BSAE
N
=
94
; nM-AE =
n
(94) =
155
(94) = 57
n nM-AE N 254
(b) Female-BSAE
N
=
54
; nF-AE =
n
(54) =
155
(54) = 33
n nF-AE N 254
(c) Male-BSCE
N
=
79
; nM-CE =
n
(79) =
155
(79) = 48
n nM-CE N 254
(d) Female-BSCE
N
=
27
; nF-CE =
n
(27) =
155
(27) = 17
n nF-CE N 254
Cluster Sampling. This method of sampling is convenient to use
when the population is spread over a wide geographic area. In
cluster sampling, groups, not individuals are randomly selected.
Example 3
 The population of all fifth year Agricultural Engineering students
in the country is 600
 For a margin of error of 5%, the desired sample size is 240
 A logical cluster Agricultural Engineering Institutions in the
country. Suppose there are 30 such institutions in the Philippines
with an average population of 20 fifth year agricultural
engineering students.
 The number of clusters (Agricultural Engineering institutions)
needed is 12 (240/20)
 Therefore, 12 Agricultural Engineering institutions will be
randomly selected from the 30 nationwide.
 All the fifth year students in these 12 institutions will be included
in the sample.
Summation Notation
1 2 3
1
...
n
k n
k
a a a a a

    

Terminology
1 2 3
1
...
n
k n
k
a a a a a

    

 The Greek letter, , indicates a sum and is referred to as a
summation operation.
 k is referred to as the index of summation (or summation
variable).
 ak is referred to as the k-th term of the sum
 The numbers 1 and n are the lower and upper limits of the
summation, respectively
Example - Evaluate
Here
The upper and lower limits are 1 and 4
 



4
1
2
3
k
k
k
 
3
2

 k
k
ak
         
10
16
0
4
2
3
4
4
3
3
3
3
2
2
3
1
1
3
4
1
2
2
2
2
2

















k
k
k
Basic Ideas
 As with functions, the letter used to denote
the index of summation is immaterial
 The index of summation need not start at 1
      10
3
3
3
4
1
2
4
1
2
4
1
2





 

 

 i
j
k
i
i
j
j
k
k
  cetera
et
,
5
ln
or
000
,
5
0
20
3

 


j
i
j
i
Why Use Summation Notation
Summation notation allows us to write
mathematical expressions compactly.
Properties for Summation
1 1
n n
k k
k k
ca c a
 

 
1.
2.
3.
1 1 1
( )
n n n
k k k k
k k k
a b a b
  
  
  
1 1 1
( )
n n n
k k k k
k k k
a b a b
  
  
  
Properties (cont’d)
4.
5.
6.
1
n
k
c cn



1
( 1)
2
n
k
n n
k




 
2
1
( 1) 2 1
6
n
k
n n n
k

 


Exercises
Calculate the sums indicated below:
1. .
2. .
3. .
 
 




8
1
1
1
i
i
 



8
1
2
2
m
m
m

 

4
0 1
2
1
2
j j
j
More Exercises
Write the sum using summation notation
4. .
5. .
100
3
2
1 


 
15
2
10
19
12
17
11
15
10






n
n

Still More Exercises
Find the sum
6. .
7. .











000
,
1
1 2
1
1
1
k k
k
 




99
1
1
i
i
i
The Last of the Exercises
8. Re-index the sum in Exercise 2 to run
from 0 to 2
9. Re-index the sum in Exercise 3 to run
from 1 to 5
SAMPLE PROBLEM
One-Sample Test: z-test
A random sample of 50 rice crop plots was found out to
have a mean projected yield of 106.2 cav/ha. Would this
mean that the mean yield is significantly higher than the
observed average yield of 100 cav/ha with a standard
deviation of 11 cav/ha?
ONE-SAMPLE z-test
Solution to problem
1. Ho: µ = 100 cav/ha
Ha: µ > 100 cav/ha
2. Use z-test (n ≥ 30, population standard deviation,  is known)
3. Use  = 5%
4. Value of the test criterion
where  = population standard deviation, 𝒙 =
sample mean; 𝝁 = population mean; and 𝒏
= no. of subjects in the sample
= 3.98
5. ztab = 1.645
6. Since zc > ztab , reject Ho
 The sample yield is significantly higher than the population
yield
How to determine the z tabular value (z-tab)
 For α=0.05 (level of significance: probability of rejecting a true
null hypothesis)
 For one-tailed test:
0.95 (Acceptance
Region)
0.05 (Rejection
Region)
Critical Point
zc=3.980
ztab=z0.05=1.645
 The power of the test (confidence level), β = 1-0.05 = 0.95
 Locate 𝛽 = 0.95 corresponding to the value of 𝑧𝑡𝑎𝑏 (critical point)
from the z-table
z Area under the normal curve, β
1.64 (Y1) 0.9495 (X1)
z0.05 (Y) 0.9500 (X)
1.65 (Y2) 0.9505 (X2)
 Compute for Y (by interpolation)
𝒀 = 𝒀𝟏 + 𝒀𝟐 − 𝒀𝟏
𝑿−𝑿𝟏
𝑿𝟐−𝑿𝟏
𝒀 = 𝟏. 𝟔𝟒 + (𝟏. 𝟔𝟓 − 𝟏. 𝟔𝟒)
𝟎.𝟗𝟓𝟎𝟎−𝟎.𝟗𝟒𝟗𝟓
𝟎.𝟗𝟓𝟎𝟓−𝟎.𝟗𝟒𝟗𝟓
𝒀 = 𝒛𝟎.𝟎𝟓 = 𝟏. 𝟔𝟒𝟓
 For two-tailed test
 The power of the test, β = 1-0.05/2 = 0.9750
 Locate β=0.9750 from the z-table
z Area under the normal curve, β
1.96 0.9750 (Exact, no need for
interpolation)
Alternatively, the following steps can be used in order to arrive at a
decision:
 Inspect the z-table and locate the tabled value for zc ≤ 3.98
 The highest zc in the z-table is zc ≤ 3.09 with a corresponding table
value (𝛽 value or “power of the test”) of 0.9990
 The conditional probability, therefore, is (1–0.9990) = 0.001.
Take note that as you move downwards and to the right of the z-
table, the z-table value is increasing and approaching 1.00, hence
the corresponding conditional probability is less than 0.001 and
decreasing
 Since p=0.05 (critical value or alpha, α) is greater than the
computed conditional probability (actual α), therefore, reject Ho
 Conclusion: The sample yield is significantly higher that the
population yield
SAMPLE PROBLEMS
One-Sample Test: z-test
 The average rating of 140 BSEd graduates in the 2015
Licensure Examination for Teachers of University X is 65.40%.
During the same period, the national average rating is 68.56%
with a standard deviation of 12.66%. Is the claim of the
President justified that his University is performing poorly as
compared to other HEIs offering the same program?
 The mean weight of the baggage carried into an airplane by
individual passengers at Tuguegarao City Airport is 19.8 kg.
An airport authority representative takes a random sample of
110 passengers and obtained a mean weight of 18.5 kg with a
standard deviation of 8.5 kg. Test the claim at 1% level of
significance.
SAMPLE PROBLEM
t-test, one-sample test
The average length of time for students to register for summer
classes at a certain college has been 50 minutes with a standard
deviation of 10 minutes. A new registration procedure using modern
computing machines is being tried. If a random sample of 12 students
had an average registration time of 42 minutes with a standard deviation
of 11.9 minutes under the new system, test the hypothesis that the
population mean is now less than 50 minutes, using a level of
significance of (a) 0.05, and (b) 0.01. Assume the population of times to
be normal.
When the standard deviation of the sample is substituted
for the standard deviation of the population, the statistic does not
have a normal distribution; it has what is called the t‐distribution.
Because there is a different t‐distribution for each sample size, it
is not practical to list a separate area‐of ‐the‐curve table for each
one. Instead, critical t‐values for common alpha levels (0.10,
0.05, 0.01, and so forth) are usually given in a single table for a
range of sample sizes. For very large samples, the t‐distribution
approximates the standard normal (z) distribution. In
practice, it is best to use t‐distributions any time the
population standard deviation is not known.
Values in the t‐table are not actually listed by sample size
but by degrees of freedom (df). The number of degrees of freedom
for a problem involving the t‐distribution for sample size n is
simply n – 1 for a one‐sample mean problem.
Reminders on using t-test …
ONE-SAMPLE t-test
 Solution to problem
1. Ho: µ = 50 minutes
Ha: µ < 50 minutes
2. Use t-test (n < 30 ;  is unknown)
3. Use  = 5%
4. Value of test criterion, tc
where s = sample standard deviation n = sample
size, other terms are as defined earlier
= 2.33
5. Critical region
t (5%, 11) = 2.201 (one tailed)
6. tc > ttab , reject Ho
The true (population mean) mean is less than 50 minutes
Student’s t-Distribution
One
Sided
0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005
Two
Sided
0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010
1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6
2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767
One
Sided
0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005
Two
Sided
0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010
24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390
120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
Student’s t-Distribution (Cont’d)
Other Sample Problems …
1. A Little League baseball coach wants to know if his team is
representative of other teams in scoring runs. Nationally, the average
number of runs scored by a Little League Team in a game is 5.70. He
chooses five games at random in which his team scored 5, 9, 4, 11, and 8
runs. Is it likely that his team’s scores could have come from the
national distribution? Assume an alpha level of 0.05. What is the 95%
confidence interval for runs scored per team per game?
2. A professor wants to know if her introductory statistics class has a good
grasp of basic math. Six students are chosen at random from the class
and given a math proficiency test. The professor wants the class to be
able to score above 70 on the test. The six students get scores of 62, 92,
75, 68, 83, and 95. Can the professor have 90 percent confidence that
the mean score for the class on the test would be above 70? (Note:
Included in PS No. 5)
SAMPLE PROBLEM
z-test, two-sample test
In the recently released results of the Licensure Examination for
Teachers (LET), University A with 125 examinees posted a mean
passing percentage of 80.75 while University B with 110 examinees
posted a mean of 86.45. A check with the Professional Regulation
Commission (PRC) revealed that for that year, the standard deviation
for all takers was 11.65. Is the President of University B 99% confident
that his university Education graduates are significantly better than the
Education graduates of University A?
2-SAMPLE Ƶ-test
 Solution to problem
1. Ho: µA = µB
Ha: µA ≠ µB
2. Use Ƶ-test
3. Use  = 1%
4. Value of test criterion
𝑧𝑐𝑜𝑚𝑝 =
𝑥𝐴−𝑥𝐵
𝜎
1
𝑛𝐴
+
1
𝑛𝐵
=
80.75 −86.45
11.65
1
125
+
1
110
= 3.74
5. Critical region
𝑧0.01 = 2.58
6. Since 𝑧𝑐𝑜𝑚𝑝 > 𝑧0.01, reject 𝐻𝑜. ∴ the President of University B is
correct in his claim that his Education graduates are significantly better
than those of University A.
A researcher wants to determine whether or not a given drug has any
effect on the scores of human subjects performing a task of ESP
sensitivity. He randomly assigns his subjects to one of two groups.
Nine hundred subjects in group 1 (the experimental group) receive an
oral administration of the drug prior to testing. In contrast, 1000
subjects in group 2 (control group) receive a placebo. The result of the
ESP sensitivity tests are as follows:
 Mean ESP for group 1 is 9.78 with a standard deviation of 4.05
 Mean ESP for group 2 is 15.10 with a standard deviation of 4.28
Is the drug effective? (Note: lower ESP means less sensitive)
SAMPLE PROBLEM
t-test, two-sample test
An English Professor wishes to see if a literature course changes
regional attitudes. The literature class deals with regional problems. A
regional attitude test is given to 12 students at the beginning (score 1)
and end (score 2) of the semester. The scale is from 20 to 100 with high
score indicating a high degree of regional bias. The results are as
follows:
Is there any significant differences between the two sample means?
Score 1 67 78 91 53 48 56 62 47 28 37 46 52
Score 2 58 69 80 54 32 49 64 40 27 34 39 47
TWO-SAMPLE t-test
 x
where:
= pooled variance
=
𝑛1−1 𝑠1
2+(𝑛2−1)𝑠2
2
(𝑛1+𝑛2−2)
= variance of samples 1 & 2, respectively
= no. of subjects in samples 1 & 2, respectively
 Test of Hypothesis
1. Ho: µ1 = µ2
Ha: µ1 ≠ µ2
2. Test criterion; t-test
3. Level of significance, 5%
4. Computed value of the test criterion
5. Critical region
ttab = t (/2, n1+n2 – 2)
6. Conclude
 Solution to Problem on t-test (two sample test)
(2 t-test)
n1 = 12 n2 = 12
 Test of Hypothesis
1. Ho: µ1 = µ2
Ha: µ1 ≠ µ2
2. Use t-test
3. Use  = 5%
4. X
5. Critical region
ttab = t (/2, 22) = 2.074
6. Since tc < ttab , accept Ho
The intervention does not have significant impact
on the regional bias of the literature students
𝑠𝑝
2 =
12 − 1 297.90 + (12 − 1)(261.17)
(12 + 12 − 2)
= 279.54
𝑡𝑐 =
55.42 − 49.42
279.54
1
12
+
1
12
= 0.88
SAMPLE PROBLEM
F-test
The data below represent the number of hours
of pain relief provided by five different brands of
headache tablets administered to 25 subjects. The
25 subjects were randomly divided into five groups
and each group was treated with a different brand.
Brand of Tablet
A B C D E
5
4
8
6
3
9
7
8
6
9
3
5
2
3
7
2
3
4
1
4
7
6
9
4
7
Perform the analysis of variance and test the
hypothesis at the 0.05 level of significance that the
mean number of hours of relief provided by the
tablets is the same for all five brands.
Three or more sample test
 F-test
Source of
Variation
Degree of
Freedom
Sum of
Squares
Mean
Squares
Fc
Ftab
5% 1%
Column 4 79.44 19.86 6.90** 2.87 4.43
Error 20 57.60 2.88
TOTAL 24 137.04
** significant at 1% level
Working Equations
cdf = p-1 ; p = no. of columns /group
=5 – 1 = 4
Tdf = pr-1 ; r = no. of subjects / group
= (5)(5) -1
= 24
Edf = Tdf – cdf
= 24 – 4
= 20
Correction Factor, CF
TSS = (52+42+… + 72) – CF= 834 - 696.96
= 137.04
CSS = (262 + 392 + 202 + 142 + 332)/5 – CF
= 776.40 – 696.96
= 79.44
ESS = TSS – CSS
= 137.04 – 79.44
= 57.60
CMS = CSS/Cdf = 79.44/4 = 19.86
EMS = ESS/Edf = 57.60/20 = 2.88
Fc = CMS/EMS
= 6.90
F (5%, 4,20) = 2.87 = F0.05
F (1%, 4,20) = 4.43 = F0.01
Fc > F0.01
 The 5 brands are significantly different at
1% level of significance
=
19.86
2.88
Very Clean advertises that its detergent will remove all stains, except
oil-based paint, in any kind of water. Consumer Action is evaluating
this claim. Batches of washing were run in randomly chosen homes
having a particular type of water – hard, moderate, or soft. Each
batch contains an assortment of rags and cloth scraps stained with
food products, grease, and dirt over a 150 square inch area. After
washing the number of square inches that were still stained was
determines and the following results were obtained:
Observation
Type of Water
Hard Moderate Soft
1
2
3
4
5
6
4
3
9
7
5
6
9
4
3
5
0
2
4
3
At 5% level, should Consumer Action conclude that the type of
water affects the effectiveness of the detergent?
Another Sample Problem: F-test
SAMPLE PROBLEM
Pearson r, Association
Consider the following data taken from three sample barangays in
Iligan City, Lanao del Norte during the NCSO and BAECon
Integrated Survey of Households in the 3rd quarter of 1977 (X - highest
grade completed by household head in years; Y - total family income
for the quarter in pesos).
Household
Number
Highest
Grade, X
Income,
Y
Household
Number
Highest
Grade, X
Income,
Y
1 12 1444 12 8 1440
2 13 1650 13 14 2140
3 13 1200 14 15 3330
4 18 2880 15 8 750
5 8 360 16 10 108
6 10 1965 17 4 150
7 6 744 18 10 240
8 8 2784 19 14 3000
9 10 1940 20 6 400
10 6 2450 21 13 2250
11 6 1290 22 6 100
Is there significant relationship between the family income and the
highest grade obtained by the household head?
PEARSON CORRELATION COEFFICIENT (r)

= cross – product of x & y
CPxy = cross product of x & y
SSy = sum of squares of y
SSy = sum of squares of y
n
Y
X
-
XY
n
n
n 



n
X)
(
-
X
2
2
n
n 


n
Y)
(
-
Y
2
2
n
n 


Value of r Interpretation
0.00 – 0.20 Slight correlation, negligible relationship
0.21 – 0.40 Low correlation, definite but small relationship
0.41 – 0.70 Moderate correlation, substantial relationship
0.71 – 0.90 High correlation, marked relationship
0.91 – 1.00 Very high correlation, very dependable relationship
Student Math Score Physics Score
1 3 6
2 2 4
3 4 4
4 6 7
5 5 5
6 1 3
Sample Problem: Given below are the scores of six
students in Math and Physics
Required:
 Correlation coefficient and its interpretation
 Are the two scores significantly related?
Steps on Test of Hypothesis on Correlation Coefficient, r
1. Ho: ρ = 0; ( X & Y are statistically independent)
Ha: ρ ≠ 0; ( X & Y are statistically dependent)
2. Define the level of significance,
3. Select the test criterion (t-test)
4. Compute the value of the test criterion
5. Define critical region
ttab = t (/2, n-2)
6. Conclude
Solution to the problem on
Pearson r (Association)
n = 22
218
X 

n
2444
X
2


n
91
.
9

X
32615
Y 

n
70896117
Y
2


n
50
.
1482

Y
373084
XY 

n
𝐶𝑃
𝑥𝑦 = 𝑋𝑌 −
𝑋 𝑌
𝑛
= 373084 −
218 32615
22
= 49899
𝑆𝑆𝑥 = 𝑋2 −
( 𝑋2)
𝑛
= 2444 −
218 2
22
= 283.82
𝑆𝑆𝑦 = 𝑌2 −
( 𝑌)2
𝑛
= 70896117 −
32615 2
22
= 22544379.50
𝑟 =
𝐶𝑃
𝑥𝑦
(𝑆𝑆𝑥)(𝑆𝑆𝑦
=
49899
(283.82)(22544379.50)
= 0.62
Test of Hypothesis
2. Use t-test
The annual income and number of years of schooling
of family heads are significantly related (associated)
1. 𝐻𝑜: ρ = 0
𝐻𝑎: ρ ≠ 0
3. 𝑈𝑠𝑒 α = 0.05
4. 𝑡𝑐 =
𝑟
1−𝑟2
𝑛−2
=
0.62
1−0.622
22−2
= 3.54
5. 𝑡𝑡𝑎𝑏 = 𝑡(0.05
2,22−2) = 2.086
6. 𝑆𝑖𝑛𝑐𝑒 𝑡𝑐 > 𝑡𝑡𝑎𝑏, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜
Student’s t-Distribution
One
Sided
0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005
Two
Sided
0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010
1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6
2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767
One
Sided
0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005
Two
Sided
0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010
24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390
120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
Student’s t-Distribution (Cont’d)
Simple Linear Regression
 Simple regression analysis can be performed between two variables
if the relationship between them is linear for the purpose of
determining their functional relationship in order to predict one on
the basis of the other
 The functional (linear) relationship is of the form:
𝑌 = 𝑎 + 𝑏𝑋
where 𝑌 is the predicted value of Y given the value of X, a is the
intercept, and b is the slope of the regression line
 X is the independent variable and used as the “predictor”. Y is the
variable whose value is to be “predicted” is called the dependent
variable (also called the predictand or criterion variable).
 The intercept a can be calculated using the expression:
o 𝑎 =
( 𝑌)( 𝑋2) −( 𝑋)( 𝑋𝑌)
𝑁( 𝑋2)−( 𝑋)2
 The slope of the regression line b on the other hand can be
calculated using the expression:
o 𝑏 =
𝑁( 𝑋𝑌)−( 𝑋)( 𝑌)
𝑁( 𝑋2)−( 𝑋)2
 In the above formula, all we need to know are 𝑌, N, 𝑋, 𝑋𝑌,
and 𝑋2
.
From the given problem:
𝑎 =
32615 2444 −(218)(373084)
22 2444 − (218)2 =
−1621252
6244
= -259.6496
b =
22 373084 −(218)(32615)
22 2444 − (218)2 =
1097778
6244
= 175.8133
or
𝑏 =
𝐶𝑃𝑥𝑦
𝑆𝑆𝑥
=
49899
283.82
= 175.81
𝑎 = 𝑌 − b𝑋 = 1482.50 − 175.81 9.91 = −259.78
Therefore, the equation of the regression line is given by::
𝑌 = −259.78 + 175.81X
Household
Number, i
Highest
Grade, X
Income, Y
(Pesos)
1 12 1444 1850.1100 -406.1100
2 13 1650 2025.9233 -375.9233
3 13 1200 2025.9233 -825.9233
4 18 2880 2904.9898 -24.9898
5 8 360 1146.8568 -786.8568
6 10 1965 1498.4834 466.5166
7 6 744 795.2302 -51.2302
8 8 2784 1146.8568 1637.1432
9 10 1940 1498.4834 441.5166
10 6 2450 795.2302 1654.7698
11 6 1290 795.2302 494.7698
12 8 1440 1146.8568 293.1432
13 14 2140 2201.7366 -61.7366
14 15 3330 2377.5499 952.4501
15 8 750 1146.8568 -396.8568
16 10 108 1498.4834 -1390.4834
17 4 150 443.6036 -293.6036
18 10 240 1498.4834 -1258.4834
19 14 3000 2201.7366 798.2634
20 6 400 795.2302 -395.2302
21 13 2250 2025.9233 224.0767
22 6 100 795.2302 -695.2302
𝑌 = −259.6496 + 175.8133(𝑋) 𝜖𝑖 = 𝑌𝑖 − 𝑌
𝜖𝑖 =−0.0082
≈ 0.00
Sample Problem: Spearman 
An English Professor wishes to see if a literature course
changes regional attitudes. The literature class deals with
regional problems. A regional attitude test is given to 12
students at the beginning (score 1) and end (score 2) of the
semester. The scale is from 20 to 100 with high score
indicating a high degree of regional bias. The results are as
tabulated below:
Determine if there is significant association of the two
ranked scores.
Score 1 67 78 91 53 48 56 62 47 28 37 46 52
Score 2 58 69 80 54 32 49 64 40 27 34 39 47
STUDENT
NO.
SCORE 1
(X1i)
RANK
(Rx1i)
SCORE 2
(X2i)
RANK
(Rx2i)
di di
2
1 67 10 58 9 1 1
2 78 11 69 11 0 0
3 91 12 80 12 0 0
4 53 07 54 08 -1 1
5 48 05 32 02 3 9
6 56 08 49 07 1 1
7 62 09 64 10 -1 1
8 47 04 40 05 -1 1
9 28 01 27 01 0 0
10 37 02 34 03 -1 1
11 46 03 39 04 -1 1
12 52 06 47 06 0 0
di = Rx1i - Rx2i
SPEARMAN RHO (rs)
Solution to Problem on Spearman Rho
2. Use t-test (n > 10)
3. 𝑈𝑠𝑒 ∝ = 0.05
2. X
= 8.71
rs = spearman rho correlation
coefficient
1. 𝐻𝑜: 𝜌 = 0
𝐻𝑎: 𝜌 ≠ 0
4. 𝑡𝑐 = 𝑟𝑠
𝑛−2
1−𝑟𝑠
2
𝑡𝑐 = 0.94
12 − 2
1 − 0.942
Spearman …
5. Critical region
ttab = t(5%/2, 10) = 2.228 (two tailed)
6. tc > ttab , reject Ho
 There is significant association between the two ranked scores,
meaning the intervention is significantly effective in reducing
regional bias

Mais conteúdo relacionado

Semelhante a Adv.-Statistics-2.pptx

Statistics
StatisticsStatistics
Statistics
pikuoec
 

Semelhante a Adv.-Statistics-2.pptx (20)

Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
Inferential Statistics.pptx
Inferential Statistics.pptxInferential Statistics.pptx
Inferential Statistics.pptx
 
Statistics
StatisticsStatistics
Statistics
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
SPSS software application.pdf
SPSS software application.pdfSPSS software application.pdf
SPSS software application.pdf
 
Introduction.pdf
Introduction.pdfIntroduction.pdf
Introduction.pdf
 
Bio stat
Bio statBio stat
Bio stat
 
01 Introduction (1).pptx
01 Introduction (1).pptx01 Introduction (1).pptx
01 Introduction (1).pptx
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Introduction To Statistics.ppt
Introduction To Statistics.pptIntroduction To Statistics.ppt
Introduction To Statistics.ppt
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Chapter-one.pptx
Chapter-one.pptxChapter-one.pptx
Chapter-one.pptx
 
statistics.pdf
statistics.pdfstatistics.pdf
statistics.pdf
 
INTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdfINTRO to STATISTICAL THEORY.pdf
INTRO to STATISTICAL THEORY.pdf
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Meaning and Importance of Statistics
Meaning and Importance of StatisticsMeaning and Importance of Statistics
Meaning and Importance of Statistics
 

Mais de marissacasarenoalmue

Mais de marissacasarenoalmue (18)

Group3-UCSP.pptx
Group3-UCSP.pptxGroup3-UCSP.pptx
Group3-UCSP.pptx
 
Meaning-and-Nature-of-Culture.pptx
Meaning-and-Nature-of-Culture.pptxMeaning-and-Nature-of-Culture.pptx
Meaning-and-Nature-of-Culture.pptx
 
Group-2-Socialization-and-Career-Development.pptx
Group-2-Socialization-and-Career-Development.pptxGroup-2-Socialization-and-Career-Development.pptx
Group-2-Socialization-and-Career-Development.pptx
 
Part-2-Innopreneurship.pptx
Part-2-Innopreneurship.pptxPart-2-Innopreneurship.pptx
Part-2-Innopreneurship.pptx
 
Innopreneurship-Entrepreneurship-Innovation.pptx
Innopreneurship-Entrepreneurship-Innovation.pptxInnopreneurship-Entrepreneurship-Innovation.pptx
Innopreneurship-Entrepreneurship-Innovation.pptx
 
Socialization-and-Career-Development.pptx
Socialization-and-Career-Development.pptxSocialization-and-Career-Development.pptx
Socialization-and-Career-Development.pptx
 
UCSP-Q1-W1-.pptx
UCSP-Q1-W1-.pptxUCSP-Q1-W1-.pptx
UCSP-Q1-W1-.pptx
 
CHAPTER-4-The-Lure-of-Corporate-Virtue.pptx
CHAPTER-4-The-Lure-of-Corporate-Virtue.pptxCHAPTER-4-The-Lure-of-Corporate-Virtue.pptx
CHAPTER-4-The-Lure-of-Corporate-Virtue.pptx
 
Pagbasa-at-Pagsulat-PPT-Week-4-DESKRIPTIBO-Copy.pptx
Pagbasa-at-Pagsulat-PPT-Week-4-DESKRIPTIBO-Copy.pptxPagbasa-at-Pagsulat-PPT-Week-4-DESKRIPTIBO-Copy.pptx
Pagbasa-at-Pagsulat-PPT-Week-4-DESKRIPTIBO-Copy.pptx
 
CHAPTER-2-STRATEGY.pptx
CHAPTER-2-STRATEGY.pptxCHAPTER-2-STRATEGY.pptx
CHAPTER-2-STRATEGY.pptx
 
Group 5-Canada Democracy.pptx
Group 5-Canada Democracy.pptxGroup 5-Canada Democracy.pptx
Group 5-Canada Democracy.pptx
 
Impact-of-Mass-Media-on-Socialization.pptx
Impact-of-Mass-Media-on-Socialization.pptxImpact-of-Mass-Media-on-Socialization.pptx
Impact-of-Mass-Media-on-Socialization.pptx
 
Child-Misbehavior-and-Socialization-Issues.pptx
Child-Misbehavior-and-Socialization-Issues.pptxChild-Misbehavior-and-Socialization-Issues.pptx
Child-Misbehavior-and-Socialization-Issues.pptx
 
Statement of Financial Position.pptx
Statement of Financial Position.pptxStatement of Financial Position.pptx
Statement of Financial Position.pptx
 
BNHS-FIRE-SAFETY.pdf
BNHS-FIRE-SAFETY.pdfBNHS-FIRE-SAFETY.pdf
BNHS-FIRE-SAFETY.pdf
 
Cultural relativism.pptx
Cultural relativism.pptxCultural relativism.pptx
Cultural relativism.pptx
 
Acctg. Principles.pptx
Acctg. Principles.pptxAcctg. Principles.pptx
Acctg. Principles.pptx
 
Analysis and Interpretation of FS 1.pptx
Analysis and Interpretation of FS 1.pptxAnalysis and Interpretation of FS 1.pptx
Analysis and Interpretation of FS 1.pptx
 

Último

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
dlhescort
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Adv.-Statistics-2.pptx

  • 2. Advanced Statistics The course deals with parametric and non-parametric statistics. It covers the topics on test of association such as Spearman rho, Phi coefficient, contingency coefficient, biserial; testing of hypotheses about two independent groups such as Two independent samples-test Mann-Whitney U, Wilcoxon W; testing of hypotheses about three or more independent groups such as one-way ANOVA, Kruskal-Wallis, Jonkheere-Terpstra test, and testing of hypotheses about repeated measures like Paired T-test, Sign Test and Chi-square Test of association and the statistical power analysis. It includes applications and data analysis with computations carried out using SPSS.
  • 3. Objectives of the Course  Familiarize the students with fundamental topics in Statistics such as descriptive statistics, inferential statistics, parametric statistics, and non- parametric statistics  For the students to be able to analyze and interpret sets of measurements / data by applying any of the topics included in the course outline.
  • 4. COURSE OUTLINE 1. Measurements 2. Sampling 3. Summation Notation 4. Frequency Distribution Table 5. Measure of Central Location 6. One-Sample Tests 7. Two-Sample Tests 8. More than Two-Sample Tests 9. Regression and Correlation 10. Chi-Square Test
  • 5. Statistics Statistics (as a discipline) is the scientific method of collecting, organizing, summarizing, presenting and analyzing data for the purpose of drawing (valid) conclusion(s) and making (reasonable) recommendations Statistics is SCIENCE (systematic involving procedures) and ARTS (how to use e.g. researches) statistics -- mass of data (as long as data is there, Statistics is there)
  • 6. There are three kinds of lies …. 1. LIES! 2. DAMNED LIES!! and 3. STATISTICS!!! …. Benjamin Disraeli
  • 7. Data Gathering  Objective method. Data are gathered by measurement or direct observation (e.g. measuring the weight of 1000 heads of cabbage. These data are classified as primary data  Subjective method. Data are provided by respondents (e.g. data on the amount of rice harvest provided by all farmers in Nueva Vizcaya. These data are classified as secondary data  Use of existing records. Data are gathered from previously collected information by some persons or institutions (e.g. rice yield records in Nueva Vizcaya for the past 20 years obtained from the Bureau of Agricultural Statistics). These data are classified as secondary data
  • 8. Two Phases of Statistics 1. Descriptive statistics. Deals with the methods of collecting, organizing, summarizing, and presenting and their interpretation. 2. Inferential statistics. Concerned with making generalizations about a larger set of data where only a part of it (sample) is examined o Estimation. The objective of estimation is to come up with a value or a range of values, computed from the sample, and will be inferred to as the characteristics of the population where the sample is taken o Hypothesis Testing. A statistical procedure for testing whether to accept or reject a hypothesis (about population characteristics) on the basis of a sample
  • 9. Levels of Measurements Measurement is the process of assigning numbers to observations in such a way that the numbers are amenable to analysis by manipulation or operation according to certain rules. There are four levels of measurements: nominal, ordinal, interval, and ratio and each level determines the appropriate statistical tool or procedure that can be applied to the set of data in that particular level of measurement (Table 1).
  • 10.  Nominal level. Values are simple labels or categories or names without implied ordering or hierarchy in the labels (e.g. Tax Identification Number or TIN, gender, civil status, race or color)  Ordinal level. Values are simply labels with an implied ordering or hierarchy in the labels. The distance between two labels, however, is unknown (e.g. sizes of shirts, job hierarchy, income levels)
  • 11.  Interval level. Values can be ordered or arranged according to magnitude or hierarchy; distance between two values is known; can add / subtract but cannot multiply / divide; the zero point is arbitrary (e.g. Intelligence Quotient or IQ, Temperature in 0F or 0C)  Ratio level. Values have all the properties of the interval level. In addition, values can be multiplied or divided and the zero point is fixed (e.g. age, height, area, mass, length)
  • 12. Table 1. Four levels of measurements and the statistical tools appropriate for each level LEVEL DEFINING RELATIONS EXAMPLE OF APPROPRIATE STATISTICS APPROPRIATE STATISTICAL TEST Nominal • Equivalence • Mode • Frequency • Contingency coefficient • Non-parametric Ordinal • Equivalence • Hierarchy / order • Median • Percentile • Spearman  • Kendall  • Kendall  • Non-parametric Interval • Equivalence • Hierarchy / order • Known ratio of two intervals • Mean • Standard deviation • Pearson r • Multiple R • Non-parametric • Parametric Ratio • Equivalence • Hierarchy / order • Known ratio of two intervals • Known ratio of two scale values • Mean • Standard deviation • Pearson r • Multiple R • Geometric mean • Coefficient of variation • Non-parametric • Parametric
  • 13. Universe, Population, Sample and Variables A researcher would like to study the characteristics of poor households in the province of Nueva Vizcaya as of December 31, 2022. Included in the study are the following:  measurements on annual households income  household head's highest educational attainment  employment status (employed, unemployed) of the household head, and  household size Because of time constraints to complete the study, the researcher obtained measurements on 50 randomly selected poor households in the province.
  • 14. Universe  All poor households in the province of Nueva Vizcaya as of December 31, 2022 Variables  Annual households' income  Highest educational attainment of household heads  Employment status of household heads  Household size Population  The population for each variable is as follows: • Annual households' income - poor households • Highest educational attainment - poor household heads • Employment status - poor household heads • Household size - poor households Sample  The 50 randomly selected poor households in the province of Nueva Vizcaya Definition / specification of terms
  • 15. Sampling  Inferential statistics involves the process of drawing out inferences or generalization from the sample which is the basis in the formulation of conclusions about the population. The accuracy of these inferences / conclusions depends to a large extent upon the representativeness of the sample. A representative sample exhibits most, if not all the properties of the population or, in other words, a representative sample is a miniature of the population. When drawing a sample from a population, the two basic questions that must be addressed are: • What is the size of the sample?, and • How is each member of the sample selected?
  • 16. Sample size  The reasons why a sample is considered in any research undertaking are the following: (a) to save resources (4Ms – man, materials, machineries and money; time; and effort), (b) smaller volume of data to deal with, thus making analysis and interpretation easier, and (c) to overcome the problem of dealing with members of the population which are inaccessible.  In determining the sample size, the following must be considered: • The bigger the size of the population is, the bigger is the size of the sample • Margin of error (sampling error), is the percentage of error incurred in selecting a sample that is not representative of the population. In random sampling, choosing a representative sample is attributed to chance probability. The probability of NOT selecting a representative sample is known as the margin of error. The lesser the margin of error is allowed, the more members of the population should be selected, and the larger the sample size. (In fact, if one does not allow a margin of error, the whole population should be used).
  • 17. The sample size is determined using the Slovin’s formula, n = N 1+ Ne2 where n is the sample size, N is the population size, and e is the desired margin of error (decimal) Sample Size
  • 18. n = N = 5000 = 370 1+ Ne2 1+ 5000(0.05)2 n = N = 5000 = 3333 1+ Ne2 1+ 5000(0.01)2 Example 1. With a margin of error of 5%, what is the sample size, n for a population size of N=5000? If e = 1%, then
  • 19. Example 2. During the second semester of SY 2021-2022, the distribution of enrolment of the College of Engineering is as follows: COURSE GENDER TOTAL MALE FEMALE BSAE 94 54 148 BSCE 79 27 106 TOTAL 173 81 254 Using a 5% margin of error, draw out a sample size employing Proportional Stratified Random Sampling
  • 20. Solution: 1. If the grand total (population size) and subtotals are not given, compute each. 2. Compute the sample size, n with the desired margin of error. n = N = 254 = 155 1+ Ne2 1+ 254(0.05)2 3. Compute the number for each subgroup. (a) Male-BSAE N = 94 ; nM-AE = n (94) = 155 (94) = 57 n nM-AE N 254
  • 21. (b) Female-BSAE N = 54 ; nF-AE = n (54) = 155 (54) = 33 n nF-AE N 254 (c) Male-BSCE N = 79 ; nM-CE = n (79) = 155 (79) = 48 n nM-CE N 254 (d) Female-BSCE N = 27 ; nF-CE = n (27) = 155 (27) = 17 n nF-CE N 254
  • 22. Cluster Sampling. This method of sampling is convenient to use when the population is spread over a wide geographic area. In cluster sampling, groups, not individuals are randomly selected. Example 3  The population of all fifth year Agricultural Engineering students in the country is 600  For a margin of error of 5%, the desired sample size is 240  A logical cluster Agricultural Engineering Institutions in the country. Suppose there are 30 such institutions in the Philippines with an average population of 20 fifth year agricultural engineering students.  The number of clusters (Agricultural Engineering institutions) needed is 12 (240/20)  Therefore, 12 Agricultural Engineering institutions will be randomly selected from the 30 nationwide.  All the fifth year students in these 12 institutions will be included in the sample.
  • 23. Summation Notation 1 2 3 1 ... n k n k a a a a a       
  • 24. Terminology 1 2 3 1 ... n k n k a a a a a         The Greek letter, , indicates a sum and is referred to as a summation operation.  k is referred to as the index of summation (or summation variable).  ak is referred to as the k-th term of the sum  The numbers 1 and n are the lower and upper limits of the summation, respectively
  • 25. Example - Evaluate Here The upper and lower limits are 1 and 4      4 1 2 3 k k k   3 2   k k ak           10 16 0 4 2 3 4 4 3 3 3 3 2 2 3 1 1 3 4 1 2 2 2 2 2                  k k k
  • 26. Basic Ideas  As with functions, the letter used to denote the index of summation is immaterial  The index of summation need not start at 1       10 3 3 3 4 1 2 4 1 2 4 1 2             i j k i i j j k k   cetera et , 5 ln or 000 , 5 0 20 3      j i j i
  • 27. Why Use Summation Notation Summation notation allows us to write mathematical expressions compactly.
  • 28. Properties for Summation 1 1 n n k k k k ca c a      1. 2. 3. 1 1 1 ( ) n n n k k k k k k k a b a b          1 1 1 ( ) n n n k k k k k k k a b a b         
  • 29. Properties (cont’d) 4. 5. 6. 1 n k c cn    1 ( 1) 2 n k n n k       2 1 ( 1) 2 1 6 n k n n n k     
  • 30. Exercises Calculate the sums indicated below: 1. . 2. . 3. .         8 1 1 1 i i      8 1 2 2 m m m     4 0 1 2 1 2 j j j
  • 31. More Exercises Write the sum using summation notation 4. . 5. . 100 3 2 1      15 2 10 19 12 17 11 15 10       n n 
  • 32. Still More Exercises Find the sum 6. . 7. .            000 , 1 1 2 1 1 1 k k k       99 1 1 i i i
  • 33. The Last of the Exercises 8. Re-index the sum in Exercise 2 to run from 0 to 2 9. Re-index the sum in Exercise 3 to run from 1 to 5
  • 34. SAMPLE PROBLEM One-Sample Test: z-test A random sample of 50 rice crop plots was found out to have a mean projected yield of 106.2 cav/ha. Would this mean that the mean yield is significantly higher than the observed average yield of 100 cav/ha with a standard deviation of 11 cav/ha?
  • 35. ONE-SAMPLE z-test Solution to problem 1. Ho: µ = 100 cav/ha Ha: µ > 100 cav/ha 2. Use z-test (n ≥ 30, population standard deviation,  is known) 3. Use  = 5% 4. Value of the test criterion where  = population standard deviation, 𝒙 = sample mean; 𝝁 = population mean; and 𝒏 = no. of subjects in the sample = 3.98 5. ztab = 1.645 6. Since zc > ztab , reject Ho  The sample yield is significantly higher than the population yield
  • 36. How to determine the z tabular value (z-tab)  For α=0.05 (level of significance: probability of rejecting a true null hypothesis)  For one-tailed test: 0.95 (Acceptance Region) 0.05 (Rejection Region) Critical Point zc=3.980 ztab=z0.05=1.645  The power of the test (confidence level), β = 1-0.05 = 0.95  Locate 𝛽 = 0.95 corresponding to the value of 𝑧𝑡𝑎𝑏 (critical point) from the z-table
  • 37. z Area under the normal curve, β 1.64 (Y1) 0.9495 (X1) z0.05 (Y) 0.9500 (X) 1.65 (Y2) 0.9505 (X2)  Compute for Y (by interpolation) 𝒀 = 𝒀𝟏 + 𝒀𝟐 − 𝒀𝟏 𝑿−𝑿𝟏 𝑿𝟐−𝑿𝟏 𝒀 = 𝟏. 𝟔𝟒 + (𝟏. 𝟔𝟓 − 𝟏. 𝟔𝟒) 𝟎.𝟗𝟓𝟎𝟎−𝟎.𝟗𝟒𝟗𝟓 𝟎.𝟗𝟓𝟎𝟓−𝟎.𝟗𝟒𝟗𝟓 𝒀 = 𝒛𝟎.𝟎𝟓 = 𝟏. 𝟔𝟒𝟓
  • 38.  For two-tailed test  The power of the test, β = 1-0.05/2 = 0.9750  Locate β=0.9750 from the z-table z Area under the normal curve, β 1.96 0.9750 (Exact, no need for interpolation)
  • 39.
  • 40. Alternatively, the following steps can be used in order to arrive at a decision:  Inspect the z-table and locate the tabled value for zc ≤ 3.98  The highest zc in the z-table is zc ≤ 3.09 with a corresponding table value (𝛽 value or “power of the test”) of 0.9990  The conditional probability, therefore, is (1–0.9990) = 0.001. Take note that as you move downwards and to the right of the z- table, the z-table value is increasing and approaching 1.00, hence the corresponding conditional probability is less than 0.001 and decreasing  Since p=0.05 (critical value or alpha, α) is greater than the computed conditional probability (actual α), therefore, reject Ho  Conclusion: The sample yield is significantly higher that the population yield
  • 41. SAMPLE PROBLEMS One-Sample Test: z-test  The average rating of 140 BSEd graduates in the 2015 Licensure Examination for Teachers of University X is 65.40%. During the same period, the national average rating is 68.56% with a standard deviation of 12.66%. Is the claim of the President justified that his University is performing poorly as compared to other HEIs offering the same program?  The mean weight of the baggage carried into an airplane by individual passengers at Tuguegarao City Airport is 19.8 kg. An airport authority representative takes a random sample of 110 passengers and obtained a mean weight of 18.5 kg with a standard deviation of 8.5 kg. Test the claim at 1% level of significance.
  • 42. SAMPLE PROBLEM t-test, one-sample test The average length of time for students to register for summer classes at a certain college has been 50 minutes with a standard deviation of 10 minutes. A new registration procedure using modern computing machines is being tried. If a random sample of 12 students had an average registration time of 42 minutes with a standard deviation of 11.9 minutes under the new system, test the hypothesis that the population mean is now less than 50 minutes, using a level of significance of (a) 0.05, and (b) 0.01. Assume the population of times to be normal.
  • 43. When the standard deviation of the sample is substituted for the standard deviation of the population, the statistic does not have a normal distribution; it has what is called the t‐distribution. Because there is a different t‐distribution for each sample size, it is not practical to list a separate area‐of ‐the‐curve table for each one. Instead, critical t‐values for common alpha levels (0.10, 0.05, 0.01, and so forth) are usually given in a single table for a range of sample sizes. For very large samples, the t‐distribution approximates the standard normal (z) distribution. In practice, it is best to use t‐distributions any time the population standard deviation is not known. Values in the t‐table are not actually listed by sample size but by degrees of freedom (df). The number of degrees of freedom for a problem involving the t‐distribution for sample size n is simply n – 1 for a one‐sample mean problem. Reminders on using t-test …
  • 44. ONE-SAMPLE t-test  Solution to problem 1. Ho: µ = 50 minutes Ha: µ < 50 minutes 2. Use t-test (n < 30 ;  is unknown) 3. Use  = 5% 4. Value of test criterion, tc where s = sample standard deviation n = sample size, other terms are as defined earlier = 2.33 5. Critical region t (5%, 11) = 2.201 (one tailed) 6. tc > ttab , reject Ho The true (population mean) mean is less than 50 minutes
  • 45. Student’s t-Distribution One Sided 0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005 Two Sided 0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010 1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437 12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221 14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965 18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922 19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883 20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819 22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792 23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767
  • 46. One Sided 0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005 Two Sided 0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010 24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745 25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707 27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690 28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674 29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659 30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551 50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496 60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460 80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416 100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390 120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291 Student’s t-Distribution (Cont’d)
  • 47. Other Sample Problems … 1. A Little League baseball coach wants to know if his team is representative of other teams in scoring runs. Nationally, the average number of runs scored by a Little League Team in a game is 5.70. He chooses five games at random in which his team scored 5, 9, 4, 11, and 8 runs. Is it likely that his team’s scores could have come from the national distribution? Assume an alpha level of 0.05. What is the 95% confidence interval for runs scored per team per game? 2. A professor wants to know if her introductory statistics class has a good grasp of basic math. Six students are chosen at random from the class and given a math proficiency test. The professor wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75, 68, 83, and 95. Can the professor have 90 percent confidence that the mean score for the class on the test would be above 70? (Note: Included in PS No. 5)
  • 48. SAMPLE PROBLEM z-test, two-sample test In the recently released results of the Licensure Examination for Teachers (LET), University A with 125 examinees posted a mean passing percentage of 80.75 while University B with 110 examinees posted a mean of 86.45. A check with the Professional Regulation Commission (PRC) revealed that for that year, the standard deviation for all takers was 11.65. Is the President of University B 99% confident that his university Education graduates are significantly better than the Education graduates of University A?
  • 49. 2-SAMPLE Ƶ-test  Solution to problem 1. Ho: µA = µB Ha: µA ≠ µB 2. Use Ƶ-test 3. Use  = 1% 4. Value of test criterion 𝑧𝑐𝑜𝑚𝑝 = 𝑥𝐴−𝑥𝐵 𝜎 1 𝑛𝐴 + 1 𝑛𝐵 = 80.75 −86.45 11.65 1 125 + 1 110 = 3.74 5. Critical region 𝑧0.01 = 2.58 6. Since 𝑧𝑐𝑜𝑚𝑝 > 𝑧0.01, reject 𝐻𝑜. ∴ the President of University B is correct in his claim that his Education graduates are significantly better than those of University A.
  • 50. A researcher wants to determine whether or not a given drug has any effect on the scores of human subjects performing a task of ESP sensitivity. He randomly assigns his subjects to one of two groups. Nine hundred subjects in group 1 (the experimental group) receive an oral administration of the drug prior to testing. In contrast, 1000 subjects in group 2 (control group) receive a placebo. The result of the ESP sensitivity tests are as follows:  Mean ESP for group 1 is 9.78 with a standard deviation of 4.05  Mean ESP for group 2 is 15.10 with a standard deviation of 4.28 Is the drug effective? (Note: lower ESP means less sensitive)
  • 51. SAMPLE PROBLEM t-test, two-sample test An English Professor wishes to see if a literature course changes regional attitudes. The literature class deals with regional problems. A regional attitude test is given to 12 students at the beginning (score 1) and end (score 2) of the semester. The scale is from 20 to 100 with high score indicating a high degree of regional bias. The results are as follows: Is there any significant differences between the two sample means? Score 1 67 78 91 53 48 56 62 47 28 37 46 52 Score 2 58 69 80 54 32 49 64 40 27 34 39 47
  • 52. TWO-SAMPLE t-test  x where: = pooled variance = 𝑛1−1 𝑠1 2+(𝑛2−1)𝑠2 2 (𝑛1+𝑛2−2) = variance of samples 1 & 2, respectively = no. of subjects in samples 1 & 2, respectively
  • 53.  Test of Hypothesis 1. Ho: µ1 = µ2 Ha: µ1 ≠ µ2 2. Test criterion; t-test 3. Level of significance, 5% 4. Computed value of the test criterion 5. Critical region ttab = t (/2, n1+n2 – 2) 6. Conclude
  • 54.  Solution to Problem on t-test (two sample test) (2 t-test) n1 = 12 n2 = 12  Test of Hypothesis 1. Ho: µ1 = µ2 Ha: µ1 ≠ µ2 2. Use t-test 3. Use  = 5% 4. X
  • 55. 5. Critical region ttab = t (/2, 22) = 2.074 6. Since tc < ttab , accept Ho The intervention does not have significant impact on the regional bias of the literature students 𝑠𝑝 2 = 12 − 1 297.90 + (12 − 1)(261.17) (12 + 12 − 2) = 279.54 𝑡𝑐 = 55.42 − 49.42 279.54 1 12 + 1 12 = 0.88
  • 56. SAMPLE PROBLEM F-test The data below represent the number of hours of pain relief provided by five different brands of headache tablets administered to 25 subjects. The 25 subjects were randomly divided into five groups and each group was treated with a different brand. Brand of Tablet A B C D E 5 4 8 6 3 9 7 8 6 9 3 5 2 3 7 2 3 4 1 4 7 6 9 4 7
  • 57. Perform the analysis of variance and test the hypothesis at the 0.05 level of significance that the mean number of hours of relief provided by the tablets is the same for all five brands. Three or more sample test  F-test Source of Variation Degree of Freedom Sum of Squares Mean Squares Fc Ftab 5% 1% Column 4 79.44 19.86 6.90** 2.87 4.43 Error 20 57.60 2.88 TOTAL 24 137.04 ** significant at 1% level
  • 58. Working Equations cdf = p-1 ; p = no. of columns /group =5 – 1 = 4 Tdf = pr-1 ; r = no. of subjects / group = (5)(5) -1 = 24 Edf = Tdf – cdf = 24 – 4 = 20
  • 59. Correction Factor, CF TSS = (52+42+… + 72) – CF= 834 - 696.96 = 137.04 CSS = (262 + 392 + 202 + 142 + 332)/5 – CF = 776.40 – 696.96 = 79.44 ESS = TSS – CSS = 137.04 – 79.44 = 57.60
  • 60. CMS = CSS/Cdf = 79.44/4 = 19.86 EMS = ESS/Edf = 57.60/20 = 2.88 Fc = CMS/EMS = 6.90 F (5%, 4,20) = 2.87 = F0.05 F (1%, 4,20) = 4.43 = F0.01 Fc > F0.01  The 5 brands are significantly different at 1% level of significance = 19.86 2.88
  • 61. Very Clean advertises that its detergent will remove all stains, except oil-based paint, in any kind of water. Consumer Action is evaluating this claim. Batches of washing were run in randomly chosen homes having a particular type of water – hard, moderate, or soft. Each batch contains an assortment of rags and cloth scraps stained with food products, grease, and dirt over a 150 square inch area. After washing the number of square inches that were still stained was determines and the following results were obtained: Observation Type of Water Hard Moderate Soft 1 2 3 4 5 6 4 3 9 7 5 6 9 4 3 5 0 2 4 3 At 5% level, should Consumer Action conclude that the type of water affects the effectiveness of the detergent? Another Sample Problem: F-test
  • 62.
  • 63. SAMPLE PROBLEM Pearson r, Association Consider the following data taken from three sample barangays in Iligan City, Lanao del Norte during the NCSO and BAECon Integrated Survey of Households in the 3rd quarter of 1977 (X - highest grade completed by household head in years; Y - total family income for the quarter in pesos).
  • 64. Household Number Highest Grade, X Income, Y Household Number Highest Grade, X Income, Y 1 12 1444 12 8 1440 2 13 1650 13 14 2140 3 13 1200 14 15 3330 4 18 2880 15 8 750 5 8 360 16 10 108 6 10 1965 17 4 150 7 6 744 18 10 240 8 8 2784 19 14 3000 9 10 1940 20 6 400 10 6 2450 21 13 2250 11 6 1290 22 6 100 Is there significant relationship between the family income and the highest grade obtained by the household head?
  • 65. PEARSON CORRELATION COEFFICIENT (r)  = cross – product of x & y CPxy = cross product of x & y SSy = sum of squares of y SSy = sum of squares of y n Y X - XY n n n     n X) ( - X 2 2 n n    n Y) ( - Y 2 2 n n   
  • 66. Value of r Interpretation 0.00 – 0.20 Slight correlation, negligible relationship 0.21 – 0.40 Low correlation, definite but small relationship 0.41 – 0.70 Moderate correlation, substantial relationship 0.71 – 0.90 High correlation, marked relationship 0.91 – 1.00 Very high correlation, very dependable relationship
  • 67. Student Math Score Physics Score 1 3 6 2 2 4 3 4 4 4 6 7 5 5 5 6 1 3 Sample Problem: Given below are the scores of six students in Math and Physics Required:  Correlation coefficient and its interpretation  Are the two scores significantly related?
  • 68. Steps on Test of Hypothesis on Correlation Coefficient, r 1. Ho: ρ = 0; ( X & Y are statistically independent) Ha: ρ ≠ 0; ( X & Y are statistically dependent) 2. Define the level of significance, 3. Select the test criterion (t-test) 4. Compute the value of the test criterion 5. Define critical region ttab = t (/2, n-2) 6. Conclude
  • 69. Solution to the problem on Pearson r (Association) n = 22 218 X   n 2444 X 2   n 91 . 9  X 32615 Y   n 70896117 Y 2   n 50 . 1482  Y 373084 XY   n
  • 70. 𝐶𝑃 𝑥𝑦 = 𝑋𝑌 − 𝑋 𝑌 𝑛 = 373084 − 218 32615 22 = 49899 𝑆𝑆𝑥 = 𝑋2 − ( 𝑋2) 𝑛 = 2444 − 218 2 22 = 283.82 𝑆𝑆𝑦 = 𝑌2 − ( 𝑌)2 𝑛 = 70896117 − 32615 2 22 = 22544379.50 𝑟 = 𝐶𝑃 𝑥𝑦 (𝑆𝑆𝑥)(𝑆𝑆𝑦 = 49899 (283.82)(22544379.50) = 0.62
  • 71. Test of Hypothesis 2. Use t-test The annual income and number of years of schooling of family heads are significantly related (associated) 1. 𝐻𝑜: ρ = 0 𝐻𝑎: ρ ≠ 0 3. 𝑈𝑠𝑒 α = 0.05 4. 𝑡𝑐 = 𝑟 1−𝑟2 𝑛−2 = 0.62 1−0.622 22−2 = 3.54 5. 𝑡𝑡𝑎𝑏 = 𝑡(0.05 2,22−2) = 2.086 6. 𝑆𝑖𝑛𝑐𝑒 𝑡𝑐 > 𝑡𝑡𝑎𝑏, 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜
  • 72. Student’s t-Distribution One Sided 0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005 Two Sided 0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010 1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437 12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221 14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965 18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922 19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883 20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819 22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792 23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767
  • 73. One Sided 0.2500 0.2000 0.1500 0.1000 0.0500 0.0250 0.0100 0.0050 0.0025 0.0010 0.0005 Two Sided 0.5000 0.4000 0.3000 0.2000 0.1000 0.0500 0.0200 0.0100 0.0050 0.0020 0.0010 24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745 25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707 27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690 28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674 29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659 30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551 50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496 60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460 80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416 100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390 120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291 Student’s t-Distribution (Cont’d)
  • 74. Simple Linear Regression  Simple regression analysis can be performed between two variables if the relationship between them is linear for the purpose of determining their functional relationship in order to predict one on the basis of the other  The functional (linear) relationship is of the form: 𝑌 = 𝑎 + 𝑏𝑋 where 𝑌 is the predicted value of Y given the value of X, a is the intercept, and b is the slope of the regression line  X is the independent variable and used as the “predictor”. Y is the variable whose value is to be “predicted” is called the dependent variable (also called the predictand or criterion variable).
  • 75.  The intercept a can be calculated using the expression: o 𝑎 = ( 𝑌)( 𝑋2) −( 𝑋)( 𝑋𝑌) 𝑁( 𝑋2)−( 𝑋)2  The slope of the regression line b on the other hand can be calculated using the expression: o 𝑏 = 𝑁( 𝑋𝑌)−( 𝑋)( 𝑌) 𝑁( 𝑋2)−( 𝑋)2  In the above formula, all we need to know are 𝑌, N, 𝑋, 𝑋𝑌, and 𝑋2 .
  • 76. From the given problem: 𝑎 = 32615 2444 −(218)(373084) 22 2444 − (218)2 = −1621252 6244 = -259.6496 b = 22 373084 −(218)(32615) 22 2444 − (218)2 = 1097778 6244 = 175.8133 or 𝑏 = 𝐶𝑃𝑥𝑦 𝑆𝑆𝑥 = 49899 283.82 = 175.81 𝑎 = 𝑌 − b𝑋 = 1482.50 − 175.81 9.91 = −259.78 Therefore, the equation of the regression line is given by:: 𝑌 = −259.78 + 175.81X
  • 77. Household Number, i Highest Grade, X Income, Y (Pesos) 1 12 1444 1850.1100 -406.1100 2 13 1650 2025.9233 -375.9233 3 13 1200 2025.9233 -825.9233 4 18 2880 2904.9898 -24.9898 5 8 360 1146.8568 -786.8568 6 10 1965 1498.4834 466.5166 7 6 744 795.2302 -51.2302 8 8 2784 1146.8568 1637.1432 9 10 1940 1498.4834 441.5166 10 6 2450 795.2302 1654.7698 11 6 1290 795.2302 494.7698 12 8 1440 1146.8568 293.1432 13 14 2140 2201.7366 -61.7366 14 15 3330 2377.5499 952.4501 15 8 750 1146.8568 -396.8568 16 10 108 1498.4834 -1390.4834 17 4 150 443.6036 -293.6036 18 10 240 1498.4834 -1258.4834 19 14 3000 2201.7366 798.2634 20 6 400 795.2302 -395.2302 21 13 2250 2025.9233 224.0767 22 6 100 795.2302 -695.2302 𝑌 = −259.6496 + 175.8133(𝑋) 𝜖𝑖 = 𝑌𝑖 − 𝑌 𝜖𝑖 =−0.0082 ≈ 0.00
  • 78. Sample Problem: Spearman  An English Professor wishes to see if a literature course changes regional attitudes. The literature class deals with regional problems. A regional attitude test is given to 12 students at the beginning (score 1) and end (score 2) of the semester. The scale is from 20 to 100 with high score indicating a high degree of regional bias. The results are as tabulated below: Determine if there is significant association of the two ranked scores. Score 1 67 78 91 53 48 56 62 47 28 37 46 52 Score 2 58 69 80 54 32 49 64 40 27 34 39 47
  • 79. STUDENT NO. SCORE 1 (X1i) RANK (Rx1i) SCORE 2 (X2i) RANK (Rx2i) di di 2 1 67 10 58 9 1 1 2 78 11 69 11 0 0 3 91 12 80 12 0 0 4 53 07 54 08 -1 1 5 48 05 32 02 3 9 6 56 08 49 07 1 1 7 62 09 64 10 -1 1 8 47 04 40 05 -1 1 9 28 01 27 01 0 0 10 37 02 34 03 -1 1 11 46 03 39 04 -1 1 12 52 06 47 06 0 0 di = Rx1i - Rx2i SPEARMAN RHO (rs) Solution to Problem on Spearman Rho
  • 80. 2. Use t-test (n > 10) 3. 𝑈𝑠𝑒 ∝ = 0.05 2. X = 8.71 rs = spearman rho correlation coefficient 1. 𝐻𝑜: 𝜌 = 0 𝐻𝑎: 𝜌 ≠ 0 4. 𝑡𝑐 = 𝑟𝑠 𝑛−2 1−𝑟𝑠 2 𝑡𝑐 = 0.94 12 − 2 1 − 0.942
  • 81. Spearman … 5. Critical region ttab = t(5%/2, 10) = 2.228 (two tailed) 6. tc > ttab , reject Ho  There is significant association between the two ranked scores, meaning the intervention is significantly effective in reducing regional bias