9 de Aug de 2021•0 gostou•60 visualizações

Denunciar

Saúde e medicina

Biostatistics exercise for master student

4. parameter and statisticONE Virtual Services

Module 3 statisticsdionesioable

Module 2 statisticsdionesioable

Module 2 statisticsLorein May Pabilona

Measures of central tendency by maria diza c. febriomariadiza

Group 3 measures of central tendency and variation - (mean, median, mode, ra...reymartyvette_0611

- 1. An Exercise Submitted to Department of Community Medicine /Hawler Medical University Biostatistics Prepared by Jwan Kareem Salh Academic Year 2020-2021 February 2021
- 2. 1 Exercise 1: Types of variables Part 1: Categorize the following variables into either numerical, or categorical variables. Age, sex, residency, occupation, years of formal education, cigarettes smoking, number of cigarettes smoked per day, weight, height, educational level of mother (whether primary, secondary, college), coffee drinking, number of cups of tea drunk/day, blood group, Rh, blood urea. Answer Part1 Numerical Categorical years of formal education Sex number of cigarettes smoked per day Residency Weight cigarettes smoking Height blood group Age Rh number of cups of tea drunk/day educational level of mother blood urea coffee drinking Occupation Answer Part 2 Variable Information Variable Position Label Measuremen t Level Role Column Width Alignmen t Print Format Write Format Code 1 <none> Scale Input 8 Right F8.2 F8.2 Age 2 <none> Scale Input 8 Right F8.2 F8.2 Sex 3 <none> Nominal Input 8 Right F8.2 F8.2 Residency 4 <none> Nominal Input 8 Right F8.2 F8.2 Occupation 5 <none> Nominal Input 8 Right F8.2 F8.2 Years_of_formal_educ ation 6 <none> Scale Input 8 Right F8.2 F8.2 Cigarettes_smoking 7 <none> Nominal Input 8 Right F8.2 F8.2 Number_of_cigarettes_ smoked_per_day 8 <none> Scale Input 8 Right F8.2 F8.2
- 3. 2 Weight 9 <none> Scale Input 8 Right F8.2 F8.2 Height 10 <none> Scale Input 8 Right F8.2 F8.2 Education_level_of_mo ther 11 <none> Ordinal Input 8 Right F8.2 F8.2 Coffe_drinking 12 <none> Nominal Input 8 Right F8.2 F8.2 Number_of_cups_of_te a_drunk_per_day 13 <none> Scale Input 8 Right F8.2 F8.2 Blood_group 14 <none> Nominal Input 8 Right F8.2 F8.2 RH 15 <none> Nominal Input 8 Right F8.2 F8.2 Blood_urea 16 <none> Scale Input 8 Right F8.2 F8.2 Variables in the working file Variable Values Value Label Sex 1.00 male 2.00 female Residency 1.00 rural 2.00 urban Occupation 1.00 house wife 2.00 employee 3.00 teacher Cigarettes_smoking 1.00 yes 2.00 no Education_level_of_mother 1.00 primary 2.00 secondary 3.00 college Coffe_drinking 1.00 yes 2.00 no Blood_group 1.00 A 2.00 B 3.00 O 4.00 AB RH 1.00 positive 2.00 negative
- 4. 3 Exercise 2 ▪ Use the random digit table to select 50 students out of 250 students. o Only mention the steps. ▪ What we call this type of random sampling method? Answer/ Simple random sampling using random number table Random digit table ▪ Using coding the system each student will have an own code for example student number one will have code 01 and so on. ▪ Selecting a random number between (1-250) from right to left. ▪ Starting number randomly blindly. ▪ By using random three-digit table 50 students been selected out of 250 students. ▪ Skip any number bigger than 250 and repeated number. 2.Simple random sampling by This method is also called unrestricted random sampling. Exercise 3 How to choose a sample of 10 students out of 50 students using systematic sampling. Only mention all steps. Answer/ ▪ Making intervals by dividing No. of population by the required sample 50/10=5 ▪ Coding the sample. ▪ Choosing starting number randomly by Excel sheet. ▪ Random start was 20 ▪ Starting from sample number 20 we will move each 5 sample until our 10 sample are complete. Exercise 4 One thousand employees (office and manual workers) of an enterprise were composed of 600 males and 400 females. Use a stratified random sampling method to collect a sample of 50 males and 50 females.
- 5. 4 -Only mention the steps. Answer/ ▪ Dividing the population into subgroups based on mutually exclusive criteria ▪ 600 males and 400 females. ▪ Using systematic sampling for choosing 50 males out of 600 males and choosing 50 females out of 400 females. ▪ Female employees: Sample interval =400/50=8 ▪ Starting number is 280 ▪ Starting from sample number 280 we will move each 8 sample until our 50 sample are complete. ▪ Male employees; Sample interval =600/50=12(the random starting number is 5). ▪ Starting number is 350 ▪ Starting from sample number 350 we will move each 12 sample until our 50 sample are complete. Exercise 5 Use the cluster sampling method to choose a representative sample of Iraqi primary school students. Only mention the steps. Answer/ ▪ First step: Identify population by identifying all Iraqi primary school ▪ Second step: Divide clusters according to geographical distributions ▪ Third step: Identify sample size and select random sample from each cluster until the sample are completed, for example if sample size is 15 representatives, we will select representatives from each cluster to complete the 15 samples
- 6. 5 Exercise 6 The following are students' marks for 50 students. Present the data in an order array. 50 82 99 78 59 77 67 79 72 93 55 68 78 99 78 98 56 72 75 94 88 95 78 66 68 92 95 74 74 95 67 71 56 67 89 78 94 59 73 77 45 70 55 78 56 45 56 78 72 83 Answer/ Data in order array will be presented from smallest to largest value by Excel sheet. Code Marks 1 45 2 45 3 50 4 55 5 55 6 56 7 56 8 56 9 56 10 59 11 59 12 66 13 67 14 67 15 67 16 68 17 68 18 70 19 71 20 72 21 72 22 72
- 7. 6 23 73 24 74 25 74 26 75 27 77 28 77 29 78 30 78 31 78 32 78 33 78 34 78 35 78 36 79 37 82 38 83 39 88 40 89 41 92 42 93 43 94 44 94 45 95 46 95 47 95 48 98 49 99 50 99 Exercise 7: The frequency distribution table Present the same above data (marks of students) in a frequency distribution table: 50 82 99 78 59 77 67 79 72 93 55 68 78 99 78 98 56 72 75 94 88 95 78 66 68 92 95 74 74 95 67 71 56 67 89 78 94 59 73 77
- 8. 7 45 70 55 78 56 45 56 78 72 83 Answer/First step: put data in order array from smallest to largest value. ▪ Putting data in order array from smallest number to largest number. ▪ Determine the actual range X maxi – X min= 99-45=54 ▪ Decide number of classes=6 ▪ Determine class interval (CI=R/NC=54/6=9) ▪ Relative frequency =F/N Code Marks 1 45 2 45 3 50 4 55 5 55 6 56 7 56 8 56 9 56 10 59 11 59 12 66 13 67 14 67 15 67 16 68 17 68 18 70 19 71 20 72 21 72 22 72 23 73 24 74 25 74 26 75 27 77 28 77
- 9. 8 29 78 30 78 31 78 32 78 33 78 34 78 35 78 36 79 37 82 38 83 39 88 40 89 41 92 42 93 43 94 44 94 45 95 46 95 47 95 48 98 49 99 50 99 Class interval Frequency Cumulative frequency Relative frequency Cumulative Frequency < 49 2 2 0.04 0.04 50-59 9 11 0.18 0.22 60-69 6 17 0.12 0.34 70-79 19 36 0.38 0.72 80-89 4 40 0.08 0.8 90-99 10 50 0.2 1.0000 Total 50 1.0000
- 10. 9 Exercise 8;Measures of central tendency For the following data (Blood urea in mg/dl), calculate measures of central tendency. 42 44 30 78 56 30 60 40 41 67 50 35 65 64 49 73 78 56 56 37 72 66 45 49 Answer/ 1-mean: x = ∑x/n =1283/24=53.45 2-median: n/2 24/2 =12 (50) n/2+1 = 24/2+1=13 (56) 50+56/2=53 Median=53 3-mode=56 Exercise 9;Measures of Dispersion ▪ Suppose the variance = 9, what will be the value of SD?
- 11. 10 ▪ What is the effect of sample size on the standard error? ▪ Calculate the coefficient of variation if you know that mean±SD= 10±2. ▪ In which case, calculating CV as a measure of dispersion is more meaningful than calculating SD? Answer/ 1.SD=√𝑣 SD=√9 = ±3 2.The standard error is inversely proportional to the sample size; the larger sample size the smaller standard error. 3.Coefficient of variation =standard deviation/mean×100 CV= 2 10 × 100 CV=20 4. When two distributions have means of different magnitude, a comparison of the C.V. is therefore much more meaningful than a comparison of their respective SD. Exercise 10 The normal distribution Answer/ Suppose that the mean (+SD) of the marks of students is 70 + 2 and their distribution is normal. 1. What is the percentage of students whose marks are more than 70? = 50 2. What is the percentage of students whose marks are less than 70? = 50 3. What is the percentage of students whose marks are between 68 and 72? = 68.26 4. What is the percentage of students whose marks are between 70 and 72? = 34.13 5. What is the percentage of students whose marks are more than 72? = 15.86
- 12. 11 6. What is the percentage of students whose marks are less than 72? =84.12 7. What is the percentage of students whose marks are more than 74? = 2.27 8. What is the percentage of students whose marks are less than 66? = 2.27 9. What is the percentage of students whose marks are more than 76? = 0.13 10. What is the percentage of students whose marks are less than 64? = 0.13 Exercise 11; The confidence interval Question 1 Calculate the 95% and 99% confidence interval for the population mean. Use the mean and SD of question 10. The sample size is 100. Answer1; SOLUTION: 95% CI=𝑿 ̅ ±𝟏. 𝟗𝟔 𝑺𝑫 √𝒏 95% CI=70±𝟏. 𝟗𝟔 × 𝟐/√𝟏𝟎𝟎 95 % CI=70 ±𝟎. 𝟑𝟗𝟐 70+0.392=70.392 OR 70-0.392=69.608 99% CI=𝑿 ̅ ±𝟐. 𝟓𝟖 𝑺𝑫 √𝒏 99% CI=70±𝟐. 𝟓𝟖 × 𝟐/√𝟏𝟎𝟎
- 13. 12 99%CI=70±0.516 = 70+0.516= 70.516 OR=70-0.516=69.484 Question 2 Interpret the confidence interval values of the waist circumference mean for the following SPSS output: Descriptive Statistics Statistic Bootstrapa Bias Std. Error 95% Confidence Interval Lower Upper waist_circumference N 53 0 0 53 53 Minimum 72.00 Maximum 137.00 Mean 94.09 -0.04 1.90 90.45 98.01 Std. Deviation 13.36 -0.20 1.70 9.77 16.63 Valid N (listwise) N 53 0 0 53 53 a. Unless otherwise noted, bootstrap results are based on 1000 bootstrap samples Answer 2/ In this descriptive box, the mean for waist circumference is 94.09.the standard deviation for the waist circumference 13.36. The 95% confidence interval for waist circumference runs from 90.45 to 98.01 cm. the number of participations (N) is 53. Exercise 12: the t test Part 1;
- 14. 13 Please have a look at the following SPSS output. - What t test application is this? - Evaluate the role of chance? - Will you accept or reject the Null hypothesis? Test Value = 100 t df Sig. (2- tailed) Mean Difference 95% Confidence Interval of the Difference Lower Upper Glucose1 -2.139 52 0.037 -6.264 -12.140 -0.387 Answer/ part1 1.One- sample t test to determine whether a sample of observations could have been generated by a process with a specific mean. 2.The result shows t value -2.139 and. which gives p-value or (2-tailed significance value of 0.037). this going to be a significant result for any realistic alpha level. 3.We are going to reject null hypothesis which asserts there is statically significant difference between our sample mean and population mean. Part 2 Please have a look at the following SPSS output. - What t test application is this? - Mention the P-value? - Is there a significant difference between the studied variables?
- 15. 14 Pair1 Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper P a i r 1 GL Glucose1- Glocose2 13.88 28.95 3.97 5.90 21.86 3.49 52 0 .001 Answer 2 1.Paired t test interested in the difference between two variables for the same subject. Often the two variables are separated by time. 2. P-value (0.001). 3. High statically significance difference between glucose 1 and glucose 2 because P-value is less than 0.05. Part 3 Please have a look at the following SPSS output. - What do we mean by independent samples t test? - Would you accept or reject the alternative hypothesis. - How would you calculate the degree of freedom for this application? Group Statistics sex N Mean Std. Deviation Std. Error Mean
- 16. 15 Pulse rate m 10 68.6000 6.67000 2.10924 f 10 91.2000 9.10189 2.87827 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2- tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Pulse rate Equal variances assumed 0.532 0.475 - 6.33 18 0.000 -22.60 3.56 -30.09 -15.10 Equal variances not assumed - 6.33 16.50 0.000 -22.60 3.56 -30.14 -15.05 Answer 3 1. Independent Samples t Test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. 2.We are going to accept the alternative hypothesis. 3.Degree of freedom=n1+n2-2 10+10-2=18