INFERENTIAL STATISTICS.pdf

INFERENTIAL
STATISTICS
DR. MANDAR BAVISKAR
M.D.(PSM) GSMC & KEMH
ASSOCIATE PROFESSOR
DEPT. OF COMMUNITY MEDICINE
Dr. Balasaheb Vikhe Patil RMC,PIMS(DU), LONI
p < 0.05

RECAP: Descriptive Statistics
■ Descriptive statistics provide a concise summary of data.
Mean
Median
SD
IQR/95% CI
Measures of
Central Tendency
Measures of
Dispersion
Scale of
Measurement
Ratio/Interval
(BSL=150mg/dl)
Ordinal
(Mild Mod., Severe Rise)
Nominal
(controlled, uncontrolled)
Mean Age 23.4 years (SD=4.23)
Median GCS Score 6 (IQR: 3-8)
90% (9/10) patients were short

What will we cover ?
By the end of the session we must know Approx. time
What are Inferential Statistics & Why do we need them? 5 min.
What are Tests of Significance? 5 min.
Parametric & Non Parametric Tests. Tests for Normality 10 min.
Selecting the Correct Test of Significance 15 min.
Activity 10 min.
BREAK 10 min.
ANOVA Demo, Post hoc tests ,Interpreting outputs (Activity) 10 min.
Statistical Fallacy (What NOT to do) 5 min.
Planning Statistical Analysis with Sample Plan 10 min.
Take Home Message & Participant Questions 10 min.
Total 90 min.

Inferential Statistics
■ Inference: (Latin: Inferent; Meaning :bringing in )
– A conclusion derived on the basis of evidence and reasoning.
■ Inferential statistics use a sample of data taken from a population to
make inferences about the population.
■ Inferential statistics are valuable when examination of each member of an
entire population is not convenient or possible.
■ You can use the information from the sample to make generalizations.

Test of Significance
– A test of significance is a statistical procedure for comparing
observed data to verify a hypothesis.
■ Steps in a Standard Test of Significance
1. Determine the appropriate test to be used
2. State the Null and Alternative hypothesis
3. Calculate the test statistic
4. Compare it with table value & get p value
5. Decision Rule: Accept or reject null hypothesis

P value
■ As researchers, we want to be sure that whatever results we got are REAL & NOT
by CHANCE
You can never eliminate CHANCE (Random Error), you can only minimize it.
■ We accept the hypothesis if the probability of being wrong is extremely small. This
probability is given by the ‘p value’.
■ In Biological Sciences we want to be at least 95% Confident that associations
between the variables are REAL
■ Therefore, we want the probability, that our finding is by chance to be
less than 5%. i.e. (p<0.05)
■ If this condition is met, we usually accept statistical significance (depending on
our hypothesis).

Limitations
■ Statistical significance DOES NOT imply Clinical significance.
– For example, In a dataset there may be a statistically significant
association (p<0.05) between patients whose name begins with the
letter ‘A’, and death due to myocardial infarction, but it has no clinical
significance.
■ P value is widely misused in research.
We must remember that the test is only as good as its assumptions.
While interpreting p values we must consider,
• Study design,
• Condition under study,
• Quality of data,
• Validity of assumptions,
• Appropriateness of the test

Parametric tests
■ Make certain assumptions about the
population from which the samples are
drawn.
– e.g.: assumption may be that
populations are normally distributed,
have the same variance etc.
MORE POWERFUL, CAN’T BE APPLIED TO
ALL TYPES OF DATA
The most commonly used test
of significance are
■ Z- Test
■ t-test
– Paired t- test
– Unpaired t- test
■ One way ANOVA
■ Repeated measures ANOVA

Non-parametric or Distribution free Tests
■ Do not make any assumption
about population parameter or
their distribution
LESS POWERFUL, CAN BE APPLIED
TO NON-NORMAL DATA
■ Chi-square test
■ Wilcoxon’s Signed rank test
■ Mann Whitney U- Test
■ Kruskal -wallis test
■ Median test
■ Freidman ANOVA test
■ Fisher exact test
■ Mc Nemar test
■ Spearman rank correlation

Tests for Normality of Data
■ Histogram
■ Q-Q plots
■ Shapiro-Wilk Test
■ Kolmogorov-Smirnoff Test
Tests of Normality
Kolmogorov-Smirnov Shapiro-Wilk
Stat. df Sig. Stat. df Sig.
Height
o.106 40 0.200 0.960 40 0.167

Goal is To Compare
Types of Data
Continuous Discreet Binomial Survival Time
1 group to a
hypothetical value
One Sample t test Wilcoxon Test
Chi Square / Binomial
test
2 unpaired groups Unpaired t test Mann- Whitney U test
Chi-Square/
Fisher’s (Small
sample)
Log Rank Test/
Mantel Haenszel
2 Paired Groups Paired t test Wilcoxon Test McNemar’s Test
Conditional
Proportional Hazards
Regression
3 or more
unmatched groups
One way ANOVA Kruskal- Wallis Test Chi- Square Test
Cox Proportional
Hazard Regression
3 or more
matched groups
Repeated measures
ANOVA
Freidman Test Cochrane Q
Conditional
Proportional Hazards
Regression
Find Strength of Association Pearson’s Correlation
Spearman’s
Correlation
Contingency
Coefficients
Predict value from single
other variable
Simple linear/non-
linear regression
Non parametric
Regression
Logistic Regression
Cox Proportional
Hazard Regression
Predict value from multiple
variables
Multiple linear/non-
linear regression
Multiple Logistic
Regression
Cox Proportional
Hazard Regression

Match the Following (Activity)
Problem Test
1. Compare Mean Systolic Blood pressure before & after the
procedure a. One Sample t test
2. Compare Mean Blood pressure of your study to National Guidelines
b. Unpaired t test
3. Compare Mean Systolic Blood pressure in Males & Females
c. Paired t test
4. Compare Mean Systolic BP at Baseline, 1 month, 3 months & 6
months d. One way ANOVA
5. Find strength of association between drop in BP & weight loss
e. Repeated measures ANOVA
6. Find predictors of drop in BP
f. Pearson’s Correlation
7. Compare Mean Systolic BP among patients receiving drug therapy,
patient receiving physiotherapy & patients receiving both h. Multiple linear regression

Match the Following (Activity)
Problem Test
1. In a large sample find association between Gender & presence of disease
a. Mann- Whitney U test
2. In a small sample find association between Gender & presence of disease
b. Wilcoxon Test
3. Compare willingness to undergo procedure before & after counseling
c. Kruskal- Wallis Test
4. Find strength of association between Pain score & Grade of Disease
d. Freidman Test
5. Compare Pain Score (VAS) in Female & Male patients
e. Spearman’s Correlation
6. Compare Pain Scores among 3 groups of patients
f. Chi-Square test
7. Compare Pain Scores before & after treatment
g. Fisher’s exact test
8. Compare Pain Scores at baseline, 1 month, 3 months & 6months
h. McNemar’s Test
9. Find Predictors of High Levels of Pain among study sample
i. Chochrane Q
10. Compare proportion of Satisfied patients at baseline, 1 month, 3 months &
6months j. Logistic Regression

Example: Interpret the output
Age
Age_cat
Sex
Pain
Score0
Pain
Score
1
Pain
Score
2
BSL0
BSL1
BSL
2
Willing0
Willing
1
Willing
2
Pain
Change
34 2 1 7 6 5 167 158 155 0 1 1 2
48 3 1 8 6 4 197 187 154 0 0 1 4
56 4 1 7 6 5 174 171 167 1 1 1 2
24 1 1 8 6 4 138 134 124 1 1 1 4
37 2 1 7 6 5 156 149 144 0 0 1 2
42 3 1 8 7 4 175 166 156 0 1 1 4
56 4 1 6 6 5 176 177 156 0 0 1 1
64 5 2 9 7 5 150 145 120 1 1 1 4
34 2 2 10 8 4 141 136 122 1 1 1 6
29 1 2 7 6 4 183 184 133 0 1 1 3
55 4 2 7 6 4 188 167 151 1 1 1 3
61 5 2 7 6 4 145 144 154 0 0 0 3
44 3 2 8 7 4 145 132 128 0 0 1 4

Descriptive Statistics & Normality Testing
Variable Mean SD
Age 44.92 12.848
BSL0 164.23 19.549
BSL1 157.69 19.089
BSL2 143.38 15.861
PainChange 3.23 1.300
Tests of Normality
Shapiro-Wilk
Statistic df Sig.
Age 0.953 13 0.639
Sex 0.646 13 0.000
PainScore0 0.876 13 0.003
PainScore1 0.650 13 0.000
PainScore2 0.628 13 0.000
BSL0 0.933 13 0.375
BSL1 0.935 13 0.392
BSL2 0.884 13 0.081
Willing0 0.628 13 0.000
Willing1 0.628 13 0.000
Willing2 0.311 13 0.000
PainChange 0.924 13 0.285
Variable PainScore0 PainScore1 PainScore2
Median (IQR) 7.0 (7-8) 6.0 (6-7) 4.0 (4-5)
SEX Frequency Percent
MALE 7 53.8
FEMALE 6 46.2
Willing @ Baseline Willing@1 Willing@2
38.5% (5/13) 61.5%(8/13) 92.3% (12/13)

Parametric Tests
■ Mean BSL in Males & Females- Unpaired T test
■ Compare mean BSL across follow up– rANOVA
■ Correlation of Age with BSL- Pearson’s
Correlation
■ Predictors of Change in Pain Score: Regression
Analysis
■ Age & Sex Comparison-Fisher’s exact
■ Compare Willingness across follow up -
Chochrane Q
■ Compare Pain Score across follow up –
Friedman’s Test
Non Parametric Tests

ANOVA (Analysis of Variance)
■ In its simplest form tests if means of three or more groups are comparable.
■ Assumptions: Independence of Observations, Normality, Homogeneity of Variance (Homoscedasticity)
DEMO

Post Hoc Tests
Equal Variances Assumed
■ Tukey HSD
■ Bonferroni
■ Dunette
Equal variances Not Assumed
■ Games-Howell
• Integral part of ANOVA.
• Significant result of ANOVA indicate that not all group means are
comparable.
• It does not tell which of the means differ. Post hoc tests help with this.
• They also limit overall error rate of the test.

We compared LA vol. & Grades of MR in a data set
Descriptives
N Mean SD Std.
Error
Mild 36 36.1 15.15 2.52
Moderate 19 45.6 22.08 7.36
Severe 12 70.0 31.11 22.0
Total 47 39.37 18.37 2.67
Test of Homogeneity of Variances
Levene Statistic df1 df2 Sig.
2.901 2 44 0.066
ANOVA
Sum of
Squares df Mean Square F Sig.
Between Groups 2615.195 2 1307.597 4.457 0.017
Within Groups 12908.89 44 293.384
Total 15524.08 46

Post Hoc Test
Multiple Comparisons
(I) (J) Mean
Difference
(I-J)
Std. Error Sig.
Tukey HSD
Mild
Moderate -9.55 6.383 0.303
Severe
-33.89* 12.44 0.025
Moderate
Mild 9.55 6.383 0.303
Severe -24.34 13.38 0.176
Severe
Mild 33.891* 12.44 0.025
Moderate 24.34 13.38 0.176
*. The mean difference is significant at the 0.05 level.

Incorrect inference
■ Small sample
■ Sampling not random
■ Bias
■ Statistical fallacies
22

Bias
(David Sackett: Biases in Analytical Research)
23
Selection bias
Those who enter the study systematically
differ from those who do not.
Example:
Volunteers
Those who survive are selected
Clear definition of population
Scientific methods of sampling
Classification bias
When the study involves two groups (Case
control; Clinical trial) the method by which two
groups are identified are ambiguous
Also called “contamination”
Standard criteria of diagnosis/ classification
Avoiding deviation from protocol
Confounding bias
Relationship between Exposure and Outcome
is affected by third factor called confounder
Coffee drinking ----> Ca pancreas
Smoking
Identifying potential confounders
Matching
Multivariate a

Statistical fallacy
■ Incorrect presentation/interpretation of statistics
Examples:
■ Association interpreted as cause-effect
■ Means interpreted without range and SD
■ Statistical significance interpreted as clinical significance
24

Plan
Simple Tables Descriptive Analysis (Uni-variate)
Appropriate
Graphical
Representation
Tests for
Normality
Comment on Data distribution
Cross tables
Bivariate Analysis –Find Significant
Associations
Correlation Strength of Association
Regression
Modelling
Multivariate Analysis-Predictors of
Outcome variable
© Mandar, 2021

Sample Plan
Variable Scale Descriptive
Statistics
Age Ratio Percentage,
Mean, SD
Sex Nominal Percentage
MR Grade
(Mild, Moderate, Severe)
Ordinal Percentage
Comorbidities Nominal Percentage
Symptoms
(Present/Absent)
Nominal Percentage
Classification of MR Nominal Percentage
LVEF Ratio Mean, SD
• Descriptive statistical analysis will be done using
percentage, mean and standard deviation.
• Appropriate graphical representation of data will
be done.
• Comparison of discreet variables will be done
using Fisher’s exact test / chi square As
appropriate.
• Unpaired t test will be used to compare LVEF in
Symptomatic & Asymptomatic patients.
• ANOVA will be used to compare LVEF across
grades of MR.
• Pearson’s Correlation coefficient was used to
calculate strength of association.
• Data analysis will be done using SPSS version
17.0 (SPSS Inc.,Chicago, IL).

Sample Dummy Table
MR GRADES LVEF ANOVA
MEAN Standard Deviation
MILD (n=) F=
MODERATE (n=) df=
SEVERE(n=) P=
Post Hoc Tests

Software
■ MS Excel
■ Graph Pad
■ SPSS
■ STATA
■ SAS
■ R studio
■ Tableau
■ NumPy
■ https://www.socscistatistics.com/
■ Open epi

TAKE HOME MESSAGE
■ Consult a Statistician While Preparing PROTOCOL not at the END of data collection
■ Prepare Dummy Tables & Plan Statistical Analysis BEFORE data collection (helps
getting the right data, saves time later)
■ CODING Masterchart appropriately saves time & trouble later on.
■ Analysis must be in line with OBJECTIVES.
Don’t apply tests just because you can
■ SIMPLE tests are more powerful
All mathematical models are wrong, but some are useful
■ Free Software & Online Data Analysis tools are easily available now a days
Statistical Significance DOES NOT imply Clinical Significance

INFERENTIAL STATISTICS.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a INFERENTIAL STATISTICS.pdf

Semelhante a INFERENTIAL STATISTICS.pdf (20)

Mais de Mandar Baviskar

Mais de Mandar Baviskar (9)

Último

Último (20)

INFERENTIAL STATISTICS.pdf