2. World of Statistics
Descriptive statistics Inferential Statistics
PROVIDE SIMPLE SUMMARIES ABOUT THE
SAMPLE AND THE MEASURES IN a)TABLES
b) GRAGHS
. CHARTS
. CIRCLE
.DOT PLOTS
.BOX-AND-WHISKER
PLOTS
.SCATTER PLOT
. SURVIVAL PLOTS
. BLAND-ALTMAN
PLOTS
The T-Test
Dummy Variables["proxy" ]=variables in inferential
statistics
General Linear Model[ most
common use of inferential statistics]
Post test-Only Analysis[simple t-test or one-way
ANOVA]
Factorial Design Analysis[ANOVA]
Randomized Block Analysis
Analysis of Covariance[ANOVA & ANCOVA]
Non-equivalent Groups Analysis
Regression-Discontinuity Analysis
Regression Point Displacement Analysis
3. Talk to beginner in next 4 pages
Descriptive statistics
Inferential Statistics
The sample behaviour inferences the
population under study beyond just visibility
Estimates parameter(s) and looks beyond
parameters at sample and population level
Testing of statistical hypotheses[how far not by
chance]
Critically analyse variability using statistical
models and advanced software under the
guidance of expert statistician
Properties of population
Everything is visible in tables and graphs
Everybody can understand with minimum effort
Uses central tendency(Bell curve) and measure
spread
Measured in parameters( mean, standard
deviation and variance)
But have not access to the whole population
you are interested in investigating
4. Variables
Independent
Also called Experimental or predictor variable
The manipulation of which influence the result
(dependent variable)
Example: The number revision or level of
intelligence that influences mark secured from
full 100 mark
Dependent Variables
Mark secured in examination is dependent variable
affected by manipulation of dependent
5. Categorical and Continuous
Variables
Categorical
Also called Discrete or qualitative
Three types
Nominal: only name them in 2 or more
groups
Ordinal: Arrange them in orders in 2 or
more groups
Dichotomous: Arrange them only two
groups
https://statistics.laerd.com/statisticalguides/types-of-variable.php
Continuous
Also called quantitative variables
Two types
Interval: measured along a continuum and
they have a numerical value (for example,
temperature measured in degrees Celsius
or Fahrenheit)
Ratio: The name "ratio" reflects the fact
that you can use the ratio of
measurements. example, a distance of ten
metres is twice the distance of 5 metres.
Ratio variables are interval variables, but
with the added condition that 0 (zero) of the
measurement indicates that there is none of
that variable
6. Sample
Sampling is the process of selecting units of people from a population of
interest so that by studying the sample we may fairly generalize our results
back to the population from which they were chosen)
The listing of the accessible
population from which you'll
draw your sample is called the
sampling frame
8. A "bell-shaped" curve that describes the group's
distribution of a single variable
Think of the bell curve as a smoothed
histogram or bar graph describing the
frequency of each possible measurement
response: The heart of statistics
9. Difference: 2 µ related not by
chance
In the figure, we show distributions for both the
treatment and control group. The mean values for
each group are indicated with dashed lines. The
difference between the means is simply the
horizontal difference between where the control and
treatment group means hit the horizontal axis.
10. Hypothesis Testing
A statistical hypothesis is an assumption about a population parameter
Hypothesis testing refers to the formal procedures used by statisticians to
accept or reject statistical hypotheses
Typically examine a random sample from the population
If sample data are not consistent with the statistical hypothesis, the
hypothesis is rejected
11. 2 types of statistical hypotheses
Null hypothesis(Ho)
Alternative hypothesis(Ha)
Sample observations result purely from chance
sample observations are influenced by some nonrandom cause
Decision Errors
Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability
of committing a Type I error is called the significance level. This probability is also called alpha( α).
Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The
probability of committing a Type II error is called Beta( β). The probability of not committing a Type II error is
called the Power of the test.
12. Example
Suppose we wanted to determine whether a coin was fair and balanced. A
null hypothesis might be that half the flips would result in Heads and
half, in Tails. The alternative hypothesis might be that the number of Heads
and Tails would be very different. Symbolically, these hypotheses would be
expressed as
H0: P = 0.5
Ha: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails.
Given this result, we would be inclined to reject the null hypothesis. We would
conclude, based on the evidence, that the coin was probably not fair and
balanced.
http://stattrek.com/hypothesis-test/hypothesis-testing.aspx?Tutorial=AP
13. The region of acceptance
http://www.six-sigma-material.com/Hypothesis-Testing.html
If p-Value is < than alpha-risk, reject Ho and acce
If p-Value is > than alpha-risk, fail to reject the Nu
14. The prototype inferential
statistics: t-test
To compare the average performance of two groups
Use a single measure to see if there is a difference
Example:
Whether eighth-grade boys and girls differ in math test scores or
whether a program group differs on the outcome measure from a
control group
15. General Linear Model includes
t-test
Analysis of Variance (ANOVA)
Analysis of Covariance (ANCOVA)
Regression analysis
Multivariate methods :
1. Factor analysis
2.Multidimensional scaling
3.Cluster analysis
4. discriminant function analysis
General Linear Model is the simplest straight-line model that opens the door for more complex
inferential statistics
16. Experimental Analysis: some
inferential statistics
The simple two-group posttest-only randomized experiment : t-test or
one-way ANOVA
The factorial experimental designs : Analysis of Variance (ANOVA)
Model
Randomized Block Designs: ANOVA blocking model
The Analysis of Covariance Experimental Design uses the Analysis of
Covariance statistical model[ANCOVA]
17. The T-Test :test the variability of
2 µ in posttest analysis
Judge the difference between their means relative to the spread or variability
of their scores
18. Mean equal but variability is different
In which of the three cases would it be easiest to conclude
that the means of the two groups are different? If you
answered the low variability case, you are correct! Why is it
easiest to conclude that the groups differ in that case?
Because that is the situation with the least amount of
overlap between the bell-shaped curves for the two groups.
If you look at the high variability case, you should see that
there quite a few control group cases that score in the range
of the treatment group and vice versa. Why is this so
important? Because, if you want to see if two groups are
"different" it's not good enough just to subtract one mean
from the other -- you have to take into account the variability
around the means! A small difference between means will
be hard to detect if there is lots of variability or noise. A large
difference will between means will be easily detectable if
variability is low
20. standard error of the difference
Therefore finally
varT=variance of treatment group
varC=variance of control group
nt=sample number in treatment group
nc=sample number in control group
Var=variance=Square of standard deviation
21. Interpretation of t
Positive t-value shows first mean > second
Negative if first mean is smaller
Then match t value in table of significance to test whether this value is large enough to say that the difference between
the groups is not likely to have been a chance finding
To
test the significance, you need to set a risk level (called the) alpha level=P value)
In most social research, the "rule of thumb" is to set the alpha level at .05
This means that five times out of a hundred you would find a statistically significant
difference between the means even if there was none (i.e., by "chance")
Determine the degrees of freedom (df) for the test.
In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2
Given the alpha level, the df, and the t-value, you can look the t-value up in a standard table of significance to see whether
this observation is a mere chance or real association.
The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent
22. Dummy Variables
Is a numerical variable
Used in regression analysis
distinguish different treatment groups
dummy variable is 0 indicates ,it is placebo group
dummy variable is 1 indicates ,it is treatment group
enable us to use a single regression equation to represent multiple groups
act like 'switches' that turn various parameters on and off in an equation
23. General Linear Model[GLM ]
Most important statistical tool that allows us to summarize a wide
variety of research outcomes
It is the foundation for
1. t-test
2. Analysis of Variance (ANOVA)
3. Analysis of Covariance (ANCOVA)
4.Regression analysis
5.Multivariate methods including factor analysis, cluster analysis,
multidimensional scaling, discriminant function analysis, canonical correlation
24. y = b0 + bx + e: The straight-line
model
y = a set of outcome variables
x = a set of pre-test variables or covariates
b0 = the set of intercepts (value of each y when each x=0)
b = a set of coefficients, one each for each x
e=vertical distance from the straight line to each point
Z: regression analysis utilizes a dummy variable for treatment
Keywords
General: in general[G]
Model: an equation[M]
Linear: An equation represented as line using bivariate or multivariate plot[L]
Regression: The extent of agreement of pre and post test result(variable ) on a line of
equation
28. e: vertical distance from the straight line to each
point. This term is called "error" because it is the
degree to which the line is in error in describing
each point
30. 3 ways to estimate Posttest-Only
Analysis
T-test
ANOVA[one-way
Analysis of Variance ]
ANCOVA[regression
analysis ]:most general
T-test
Same
result
ANOVA
ANCOVA
32. Posttest-Only Analysis result by
regression
in the statistical model yi is the same as y in the
straight line formula, β0 is the same as b, b1 is the
same as m, and Zi is the same as x. In other words, in
the statistical formula, b0 is the intercept and b1 is the
slope.
33.
34. Factorial Design Analysis:2x2
It is a regression analysis
Ingredients from 2x2 factorial table
A dummy variable (represented by a Z) for each factor
Two main effects and one interaction
Main effects are the statistics associated with the beta
values that are adjacent to the Z-variables
The interaction effect is the statistic associated with b3
(i.e., the t-value for this coefficient)
39. Regression Point Displacement
Analysis
Requires
A posttest score
A pretest score
A variable to represent the treatment group (where 0=comparison
&1=treatment)
Identical to the requirements for the ANCOVA except RPD design has
a single treated group score
The model we'll use is the Analysis of Covariance (ANCOVA) model
40. Regression Point Displacement
Analysis
Goal is to estimate the size of the vertical
displacement of the treated unit from the
regression line of all of the control units,
indicated on the graph by the dashed arrow.
The figure shows a bivariate (pre-post) distribution for
a hypothetical RPD design of a community-based
AIDS education program. The new AIDS education
program is piloted in one particular county in a state,
with the remaining counties acting as controls. The
state routinely publishes annual HIV positive rates by
county for the entire state. The x-values show the HIVpositive rates per 1000 people for the year preceding
the program while the y-values show the rates for the
year following it.
42. The cost of this ppt.
YOU CAN IMPROVE BY
No plastic use
Going through links given in this ppt,teachers,
friends and workshops interested in statistics