2. RECAP: Descriptive Statistics
■ Descriptive statistics provide a concise summary of data.
Mean
Median
SD
IQR/95% CI
Measures of
Central Tendency
Measures of
Dispersion
Scale of
Measurement
Ratio/Interval
(BSL=150mg/dl)
Ordinal
(Mild Mod., Severe Rise)
Nominal
(controlled, uncontrolled)
Mean Age 23.4 years (SD=4.23)
Median GCS Score 6 (IQR: 3-8)
90% (9/10) patients were short
3. What will we cover ?
By the end of the session we must know Approx. time
What are Inferential Statistics & Why do we need them? 5 min.
What are Tests of Significance? 5 min.
Parametric & Non Parametric Tests. Tests for Normality 10 min.
Selecting the Correct Test of Significance 15 min.
Activity 10 min.
BREAK 10 min.
ANOVA Demo, Post hoc tests ,Interpreting outputs (Activity) 10 min.
Statistical Fallacy (What NOT to do) 5 min.
Planning Statistical Analysis with Sample Plan 10 min.
Take Home Message & Participant Questions 10 min.
Total 90 min.
4. Inferential Statistics
■ Inference: (Latin: Inferent; Meaning :bringing in )
– A conclusion derived on the basis of evidence and reasoning.
■ Inferential statistics use a sample of data taken from a population to
make inferences about the population.
■ Inferential statistics are valuable when examination of each member of an
entire population is not convenient or possible.
■ You can use the information from the sample to make generalizations.
5. Test of Significance
– A test of significance is a statistical procedure for comparing
observed data to verify a hypothesis.
■ Steps in a Standard Test of Significance
1. Determine the appropriate test to be used
2. State the Null and Alternative hypothesis
3. Calculate the test statistic
4. Compare it with table value & get p value
5. Decision Rule: Accept or reject null hypothesis
6. P value
■ As researchers, we want to be sure that whatever results we got are REAL & NOT
by CHANCE
You can never eliminate CHANCE (Random Error), you can only minimize it.
■ We accept the hypothesis if the probability of being wrong is extremely small. This
probability is given by the ‘p value’.
■ In Biological Sciences we want to be at least 95% Confident that associations
between the variables are REAL
■ Therefore, we want the probability, that our finding is by chance to be
less than 5%. i.e. (p<0.05)
■ If this condition is met, we usually accept statistical significance (depending on
our hypothesis).
7. Limitations
■ Statistical significance DOES NOT imply Clinical significance.
– For example, In a dataset there may be a statistically significant
association (p<0.05) between patients whose name begins with the
letter ‘A’, and death due to myocardial infarction, but it has no clinical
significance.
■ P value is widely misused in research.
We must remember that the test is only as good as its assumptions.
While interpreting p values we must consider,
• Study design,
• Condition under study,
• Quality of data,
• Validity of assumptions,
• Appropriateness of the test
8. Parametric tests
■ Make certain assumptions about the
population from which the samples are
drawn.
– e.g.: assumption may be that
populations are normally distributed,
have the same variance etc.
MORE POWERFUL, CAN’T BE APPLIED TO
ALL TYPES OF DATA
The most commonly used test
of significance are
■ Z- Test
■ t-test
– Paired t- test
– Unpaired t- test
■ One way ANOVA
■ Repeated measures ANOVA
9. Non-parametric or Distribution free Tests
■ Do not make any assumption
about population parameter or
their distribution
LESS POWERFUL, CAN BE APPLIED
TO NON-NORMAL DATA
■ Chi-square test
■ Wilcoxon’s Signed rank test
■ Mann Whitney U- Test
■ Kruskal -wallis test
■ Median test
■ Freidman ANOVA test
■ Fisher exact test
■ Mc Nemar test
■ Spearman rank correlation
10. Tests for Normality of Data
■ Histogram
■ Q-Q plots
■ Shapiro-Wilk Test
■ Kolmogorov-Smirnoff Test
Tests of Normality
Kolmogorov-Smirnov Shapiro-Wilk
Stat. df Sig. Stat. df Sig.
Height
o.106 40 0.200 0.960 40 0.167
11. Goal is To Compare
Types of Data
Continuous Discreet Binomial Survival Time
1 group to a
hypothetical value
One Sample t test Wilcoxon Test
Chi Square / Binomial
test
2 unpaired groups Unpaired t test Mann- Whitney U test
Chi-Square/
Fisher’s (Small
sample)
Log Rank Test/
Mantel Haenszel
2 Paired Groups Paired t test Wilcoxon Test McNemar’s Test
Conditional
Proportional Hazards
Regression
3 or more
unmatched groups
One way ANOVA Kruskal- Wallis Test Chi- Square Test
Cox Proportional
Hazard Regression
3 or more
matched groups
Repeated measures
ANOVA
Freidman Test Cochrane Q
Conditional
Proportional Hazards
Regression
Find Strength of Association Pearson’s Correlation
Spearman’s
Correlation
Contingency
Coefficients
Predict value from single
other variable
Simple linear/non-
linear regression
Non parametric
Regression
Logistic Regression
Cox Proportional
Hazard Regression
Predict value from multiple
variables
Multiple linear/non-
linear regression
Multiple Logistic
Regression
Cox Proportional
Hazard Regression
12. Match the Following (Activity)
Problem Test
1. Compare Mean Systolic Blood pressure before & after the
procedure a. One Sample t test
2. Compare Mean Blood pressure of your study to National Guidelines
b. Unpaired t test
3. Compare Mean Systolic Blood pressure in Males & Females
c. Paired t test
4. Compare Mean Systolic BP at Baseline, 1 month, 3 months & 6
months d. One way ANOVA
5. Find strength of association between drop in BP & weight loss
e. Repeated measures ANOVA
6. Find predictors of drop in BP
f. Pearson’s Correlation
7. Compare Mean Systolic BP among patients receiving drug therapy,
patient receiving physiotherapy & patients receiving both h. Multiple linear regression
13. Match the Following (Activity)
Problem Test
1. In a large sample find association between Gender & presence of disease
a. Mann- Whitney U test
2. In a small sample find association between Gender & presence of disease
b. Wilcoxon Test
3. Compare willingness to undergo procedure before & after counseling
c. Kruskal- Wallis Test
4. Find strength of association between Pain score & Grade of Disease
d. Freidman Test
5. Compare Pain Score (VAS) in Female & Male patients
e. Spearman’s Correlation
6. Compare Pain Scores among 3 groups of patients
f. Chi-Square test
7. Compare Pain Scores before & after treatment
g. Fisher’s exact test
8. Compare Pain Scores at baseline, 1 month, 3 months & 6months
h. McNemar’s Test
9. Find Predictors of High Levels of Pain among study sample
i. Chochrane Q
10. Compare proportion of Satisfied patients at baseline, 1 month, 3 months &
6months j. Logistic Regression
17. Parametric Tests
■ Mean BSL in Males & Females- Unpaired T test
■ Compare mean BSL across follow up– rANOVA
■ Correlation of Age with BSL- Pearson’s
Correlation
■ Predictors of Change in Pain Score: Regression
Analysis
■ Age & Sex Comparison-Fisher’s exact
■ Compare Willingness across follow up -
Chochrane Q
■ Compare Pain Score across follow up –
Friedman’s Test
Non Parametric Tests
18. ANOVA (Analysis of Variance)
■ In its simplest form tests if means of three or more groups are comparable.
■ Assumptions: Independence of Observations, Normality, Homogeneity of Variance (Homoscedasticity)
DEMO
19. Post Hoc Tests
Equal Variances Assumed
■ Tukey HSD
■ Bonferroni
■ Dunette
Equal variances Not Assumed
■ Games-Howell
• Integral part of ANOVA.
• Significant result of ANOVA indicate that not all group means are
comparable.
• It does not tell which of the means differ. Post hoc tests help with this.
• They also limit overall error rate of the test.
20. We compared LA vol. & Grades of MR in a data set
Descriptives
N Mean SD Std.
Error
Mild 36 36.1 15.15 2.52
Moderate 19 45.6 22.08 7.36
Severe 12 70.0 31.11 22.0
Total 47 39.37 18.37 2.67
Test of Homogeneity of Variances
Levene Statistic df1 df2 Sig.
2.901 2 44 0.066
ANOVA
Sum of
Squares df Mean Square F Sig.
Between Groups 2615.195 2 1307.597 4.457 0.017
Within Groups 12908.89 44 293.384
Total 15524.08 46
21. Post Hoc Test
Multiple Comparisons
(I) (J) Mean
Difference
(I-J)
Std. Error Sig.
Tukey HSD
Mild
Moderate -9.55 6.383 0.303
Severe
-33.89* 12.44 0.025
Moderate
Mild 9.55 6.383 0.303
Severe -24.34 13.38 0.176
Severe
Mild 33.891* 12.44 0.025
Moderate 24.34 13.38 0.176
*. The mean difference is significant at the 0.05 level.
23. Bias
(David Sackett: Biases in Analytical Research)
23
Selection bias
Those who enter the study systematically
differ from those who do not.
Example:
Volunteers
Those who survive are selected
Clear definition of population
Scientific methods of sampling
Classification bias
When the study involves two groups (Case
control; Clinical trial) the method by which two
groups are identified are ambiguous
Also called “contamination”
Standard criteria of diagnosis/ classification
Avoiding deviation from protocol
Confounding bias
Relationship between Exposure and Outcome
is affected by third factor called confounder
Coffee drinking ----> Ca pancreas
Smoking
Identifying potential confounders
Matching
Multivariate a
24. Statistical fallacy
■ Incorrect presentation/interpretation of statistics
Examples:
■ Association interpreted as cause-effect
■ Means interpreted without range and SD
■ Statistical significance interpreted as clinical significance
24
26. Sample Plan
Variable Scale Descriptive
Statistics
Age Ratio Percentage,
Mean, SD
Sex Nominal Percentage
MR Grade
(Mild, Moderate, Severe)
Ordinal Percentage
Comorbidities Nominal Percentage
Symptoms
(Present/Absent)
Nominal Percentage
Classification of MR Nominal Percentage
LVEF Ratio Mean, SD
• Descriptive statistical analysis will be done using
percentage, mean and standard deviation.
• Appropriate graphical representation of data will
be done.
• Comparison of discreet variables will be done
using Fisher’s exact test / chi square As
appropriate.
• Unpaired t test will be used to compare LVEF in
Symptomatic & Asymptomatic patients.
• ANOVA will be used to compare LVEF across
grades of MR.
• Pearson’s Correlation coefficient was used to
calculate strength of association.
• Data analysis will be done using SPSS version
17.0 (SPSS Inc.,Chicago, IL).
27. Sample Dummy Table
MR GRADES LVEF ANOVA
MEAN Standard Deviation
MILD (n=) F=
MODERATE (n=) df=
SEVERE(n=) P=
Post Hoc Tests
28. Software
■ MS Excel
■ Graph Pad
■ SPSS
■ STATA
■ SAS
■ R studio
■ Tableau
■ NumPy
■ https://www.socscistatistics.com/
■ Open epi
29. TAKE HOME MESSAGE
■ Consult a Statistician While Preparing PROTOCOL not at the END of data collection
■ Prepare Dummy Tables & Plan Statistical Analysis BEFORE data collection (helps
getting the right data, saves time later)
■ CODING Masterchart appropriately saves time & trouble later on.
■ Analysis must be in line with OBJECTIVES.
Don’t apply tests just because you can
■ SIMPLE tests are more powerful
All mathematical models are wrong, but some are useful
■ Free Software & Online Data Analysis tools are easily available now a days
Statistical Significance DOES NOT imply Clinical Significance