1. An Exercise Submitted to Department of Community Medicine
/Hawler Medical University
Biostatistics
Prepared by
Jwan Kareem Salh
Academic Year
2020-2021
February 2021
2. 1
Exercise 1: Types of variables
Part 1: Categorize the following variables into either numerical, or categorical variables.
Age, sex, residency, occupation, years of formal education, cigarettes smoking, number of
cigarettes smoked per day, weight, height, educational level of mother (whether primary,
secondary, college), coffee drinking, number of cups of tea drunk/day, blood group, Rh, blood
urea.
Answer Part1
Numerical Categorical
years of formal education Sex
number of cigarettes smoked per day Residency
Weight cigarettes smoking
Height blood group
Age Rh
number of cups of tea drunk/day educational level of mother
blood urea coffee drinking
Occupation
Answer Part 2
Variable Information
Variable Position Label
Measuremen
t Level Role
Column
Width
Alignmen
t
Print
Format
Write
Format
Code 1 <none> Scale Input 8 Right F8.2 F8.2
Age 2 <none> Scale Input 8 Right F8.2 F8.2
Sex 3 <none> Nominal Input 8 Right F8.2 F8.2
Residency 4 <none> Nominal Input 8 Right F8.2 F8.2
Occupation 5 <none> Nominal Input 8 Right F8.2 F8.2
Years_of_formal_educ
ation
6 <none> Scale Input 8 Right F8.2 F8.2
Cigarettes_smoking 7 <none> Nominal Input 8 Right F8.2 F8.2
Number_of_cigarettes_
smoked_per_day
8 <none> Scale Input 8 Right F8.2 F8.2
3. 2
Weight 9 <none> Scale Input 8 Right F8.2 F8.2
Height 10 <none> Scale Input 8 Right F8.2 F8.2
Education_level_of_mo
ther
11 <none> Ordinal Input 8 Right F8.2 F8.2
Coffe_drinking 12 <none> Nominal Input 8 Right F8.2 F8.2
Number_of_cups_of_te
a_drunk_per_day
13 <none> Scale Input 8 Right F8.2 F8.2
Blood_group 14 <none> Nominal Input 8 Right F8.2 F8.2
RH 15 <none> Nominal Input 8 Right F8.2 F8.2
Blood_urea 16 <none> Scale Input 8 Right F8.2 F8.2
Variables in the working file
Variable Values
Value Label
Sex 1.00 male
2.00 female
Residency 1.00 rural
2.00 urban
Occupation 1.00 house wife
2.00 employee
3.00 teacher
Cigarettes_smoking 1.00 yes
2.00 no
Education_level_of_mother 1.00 primary
2.00 secondary
3.00 college
Coffe_drinking 1.00 yes
2.00 no
Blood_group 1.00 A
2.00 B
3.00 O
4.00 AB
RH 1.00 positive
2.00 negative
4. 3
Exercise 2
▪ Use the random digit table to select 50 students out of 250 students.
o Only mention the steps.
▪ What we call this type of random sampling method?
Answer/
Simple random sampling using random number table
Random digit table
▪ Using coding the system each student will have an own code for example student number
one will have code 01 and so on.
▪ Selecting a random number between (1-250) from right to left.
▪ Starting number randomly blindly.
▪ By using random three-digit table 50 students been selected out of 250 students.
▪ Skip any number bigger than 250 and repeated number.
2.Simple random sampling by This method is also called unrestricted random sampling.
Exercise 3
How to choose a sample of 10 students out of 50 students using systematic sampling. Only
mention all steps.
Answer/
▪ Making intervals by dividing No. of population by the required sample 50/10=5
▪ Coding the sample.
▪ Choosing starting number randomly by Excel sheet.
▪ Random start was 20
▪ Starting from sample number 20 we will move each 5 sample until our 10 sample
are complete.
Exercise 4
One thousand employees (office and manual workers) of an enterprise were composed of 600
males and 400 females. Use a stratified random sampling method to collect a sample of 50 males
and 50 females.
5. 4
-Only mention the steps.
Answer/
▪ Dividing the population into subgroups based on mutually exclusive criteria
▪ 600 males and 400 females.
▪ Using systematic sampling for choosing 50 males out of 600 males and choosing 50
females out of 400 females.
▪ Female employees:
Sample interval =400/50=8
▪ Starting number is 280
▪ Starting from sample number 280 we will move each 8 sample until our 50 sample are
complete.
▪ Male employees;
Sample interval =600/50=12(the random starting number is 5).
▪ Starting number is 350
▪ Starting from sample number 350 we will move each 12 sample until our 50 sample are
complete.
Exercise 5
Use the cluster sampling method to choose a representative sample of Iraqi primary school
students. Only mention the steps.
Answer/
▪ First step: Identify population by identifying all Iraqi primary school
▪ Second step: Divide clusters according to geographical distributions
▪ Third step: Identify sample size and select random sample from each cluster until the
sample are completed, for example if sample size is 15 representatives, we will select
representatives from each cluster to complete the 15 samples
6. 5
Exercise 6
The following are students' marks for 50 students. Present the data in an order array.
50 82 99 78 59 77 67 79 72 93
55 68 78 99 78 98 56 72 75 94
88 95 78 66 68 92 95 74 74 95
67 71 56 67 89 78 94 59 73 77
45 70 55 78 56 45 56 78 72 83
Answer/ Data in order array will be presented from smallest to largest value by Excel sheet.
Code Marks
1 45
2 45
3 50
4 55
5 55
6 56
7 56
8 56
9 56
10 59
11 59
12 66
13 67
14 67
15 67
16 68
17 68
18 70
19 71
20 72
21 72
22 72
8. 7
45 70 55 78 56 45 56 78 72 83
Answer/First step: put data in order array from smallest to largest value.
▪ Putting data in order array from smallest number to largest number.
▪ Determine the actual range X maxi – X min= 99-45=54
▪ Decide number of classes=6
▪ Determine class interval (CI=R/NC=54/6=9)
▪ Relative frequency =F/N
Code Marks
1 45
2 45
3 50
4 55
5 55
6 56
7 56
8 56
9 56
10 59
11 59
12 66
13 67
14 67
15 67
16 68
17 68
18 70
19 71
20 72
21 72
22 72
23 73
24 74
25 74
26 75
27 77
28 77
10. 9
Exercise 8;Measures of central tendency
For the following data (Blood urea in mg/dl), calculate measures of central tendency.
42 44 30 78 56 30 60 40
41 67 50 35 65 64 49 73
78 56 56 37 72 66 45 49
Answer/
1-mean: x = ∑x/n
=1283/24=53.45
2-median: n/2
24/2 =12 (50)
n/2+1 =
24/2+1=13 (56)
50+56/2=53
Median=53
3-mode=56
Exercise 9;Measures of Dispersion
▪ Suppose the variance = 9, what will be the value of SD?
11. 10
▪ What is the effect of sample size on the standard error?
▪ Calculate the coefficient of variation if you know that mean±SD= 10±2.
▪ In which case, calculating CV as a measure of dispersion is more meaningful
than calculating SD?
Answer/
1.SD=√𝑣
SD=√9 = ±3
2.The standard error is inversely proportional to the sample size; the larger sample size the
smaller standard error.
3.Coefficient of variation =standard deviation/mean×100
CV=
2
10
× 100
CV=20
4. When two distributions have means of different magnitude, a comparison of the C.V. is
therefore much more meaningful than a comparison of their respective SD.
Exercise 10 The normal distribution
Answer/
Suppose that the mean (+SD) of the marks of students is 70 + 2 and their distribution is normal.
1. What is the percentage of students whose marks are more than 70? = 50
2. What is the percentage of students whose marks are less than 70? = 50
3. What is the percentage of students whose marks are between 68 and 72? = 68.26
4. What is the percentage of students whose marks are between 70 and 72? = 34.13
5. What is the percentage of students whose marks are more than 72? = 15.86
12. 11
6. What is the percentage of students whose marks are less than 72? =84.12
7. What is the percentage of students whose marks are more than 74? = 2.27
8. What is the percentage of students whose marks are less than 66? = 2.27
9. What is the percentage of students whose marks are more than 76? = 0.13
10. What is the percentage of students whose marks are less than 64? = 0.13
Exercise 11; The confidence interval
Question 1
Calculate the 95% and 99% confidence interval for the population mean. Use the
mean and SD of question 10. The sample size is 100.
Answer1;
SOLUTION:
95% CI=𝑿
̅ ±𝟏. 𝟗𝟔
𝑺𝑫
√𝒏
95% CI=70±𝟏. 𝟗𝟔 × 𝟐/√𝟏𝟎𝟎
95 % CI=70 ±𝟎. 𝟑𝟗𝟐
70+0.392=70.392
OR 70-0.392=69.608
99% CI=𝑿
̅ ±𝟐. 𝟓𝟖
𝑺𝑫
√𝒏
99% CI=70±𝟐. 𝟓𝟖 × 𝟐/√𝟏𝟎𝟎
13. 12
99%CI=70±0.516
= 70+0.516= 70.516
OR=70-0.516=69.484
Question 2
Interpret the confidence interval values of the waist circumference mean for the following SPSS
output:
Descriptive Statistics
Statistic
Bootstrapa
Bias Std. Error
95% Confidence Interval
Lower Upper
waist_circumference N 53 0 0 53 53
Minimum 72.00
Maximum 137.00
Mean 94.09 -0.04 1.90 90.45 98.01
Std. Deviation 13.36 -0.20 1.70 9.77 16.63
Valid N (listwise) N 53 0 0 53 53
a. Unless otherwise noted, bootstrap results are based on 1000 bootstrap samples
Answer 2/
In this descriptive box, the mean for waist circumference is 94.09.the standard deviation for the
waist circumference 13.36. The 95% confidence interval for waist circumference runs from
90.45 to 98.01 cm. the number of participations (N) is 53.
Exercise 12: the t test
Part 1;
14. 13
Please have a look at the following SPSS output.
- What t test application is this?
- Evaluate the role of chance?
- Will you accept or reject the Null hypothesis?
Test Value = 100
t df
Sig. (2-
tailed)
Mean
Difference
95% Confidence
Interval of the
Difference
Lower Upper
Glucose1 -2.139 52 0.037 -6.264 -12.140 -0.387
Answer/ part1
1.One- sample t test to determine whether a sample of observations could have been generated
by a process with a specific mean.
2.The result shows t value -2.139 and. which gives p-value or (2-tailed significance value of
0.037). this going to be a significant result for any realistic alpha level.
3.We are going to reject null hypothesis which asserts there is statically significant difference
between our sample mean and population mean.
Part 2
Please have a look at the following SPSS output.
- What t test application is this?
- Mention the P-value?
- Is there a significant difference between the studied variables?
15. 14
Pair1
Paired Differences
t df Sig. (2-tailed)
Mean
Std.
Deviation
Std. Error
Mean
95% Confidence
Interval of the
Difference
Lower Upper
P
a
i
r
1
GL
Glucose1-
Glocose2
13.88 28.95 3.97 5.90 21.86 3.49 52 0 .001
Answer 2
1.Paired t test interested in the difference between two variables for the same subject. Often the
two variables are separated by time.
2. P-value (0.001).
3. High statically significance difference between glucose 1 and glucose 2 because P-value is
less than 0.05.
Part 3
Please have a look at the following SPSS output.
- What do we mean by independent samples t test?
- Would you accept or reject the alternative hypothesis.
- How would you calculate the degree of freedom for this application?
Group Statistics
sex N Mean
Std.
Deviation
Std. Error
Mean
16. 15
Pulse
rate
m 10 68.6000 6.67000 2.10924
f 10 91.2000 9.10189 2.87827
Independent Samples Test
Levene's
Test for
Equality of
Variances
t-test for Equality of Means
F Sig. t df Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower Upper
Pulse
rate
Equal
variances
assumed
0.532 0.475 -
6.33
18 0.000 -22.60 3.56 -30.09 -15.10
Equal
variances
not
assumed
-
6.33
16.50 0.000 -22.60 3.56 -30.14 -15.05
Answer 3
1. Independent Samples t Test compares the means of two independent groups in order to
determine whether there is statistical evidence that the associated population means are
significantly different.
2.We are going to accept the alternative hypothesis.
3.Degree of freedom=n1+n2-2
10+10-2=18