O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Basic Concepts for Biostatistics

27.394 visualizações

Publicada em

https://userupload.net/j72hszhboqcp

Biostatistics can be defined as the application of the mathematical tools used in statistics to the fields of biological sciences and medicine. Biostatistics is a growing field with applications in many areas of biology including epidemiology, medical sciences, health sciences, educational research and environmental sciences.

Concerns of biostatistics

Biostatistics is concerned with

collection, organization, summarization and analysis of data
drawing inferences about a body of data when only a part of the data is observed.

Publicada em: Saúde e medicina
  • Entre para ver os comentários

Basic Concepts for Biostatistics

  1. 1. BIOSTATISTICSBIOSTATISTICS 1 Check out ppt download link in description Or Download link : https://userupload.net/j72hszhboqcp
  2. 2. 2 “when you can measure what you are speaking about and express it in numbers, you know something about it but when you cannot measure, when you cannot express it in numbers, your knowledge is of meagre and unsatisfactory kind.” ....Lord Kelvin
  3. 3. 3 BiostatisticsBiostatistics Collecting Data, Understanding Data and Numbers The word is “Statistics” not “Sadistics”
  4. 4. 4 At the end of this session you will be able to: What is statistics? Use of statistics Sampling & sample designs Data Presentation of data Measures of central tendency Measures of variability Normal distribution & curve Probability Tests of significance Correlation & regression
  5. 5. CLICK HERE TO DOWNLOAD THIS PPT https://userupload.net/j72hszhboqcp
  6. 6. 6 Statistics The science of collecting, monitoring, analyzing, summarizing, and interpreting data.  This includes design issues as well. Statistics are tools Statistics – singular means figures plural - body of knowledge German  statastik political state Italian  statista statesman
  7. 7. 7 What is Biostatistics ?  tool of statistics are applied to the data that is derived from biological sciences  John Graunt (1620-1674) : father of health statistics  Statistics applied to biological (life) problems, including:  Public health  Medicine  Ecological and environmental  Much more statistics than biology, however biostatisticians must learn the biology also.
  8. 8. 8 Statistical Analyses  Descriptive Statistics  Describe the sample  Science of collecting, summarizing, presenting,  Inference  Make inferences about the population using what is observed in the sample  Primarily performed in two ways:  Hypothesis testing  Estimation
  9. 9. 9 What Do Biostatisticians Do? Identify and develop treatments for disease and estimate their effects. Identify risk factors for diseases. Design, monitor, analyze, interpret, and report results of clinical studies. Develop statistical methodologies to address questions arising from medical/public health data. Locate , define & measure extent of disease Ultimate objective  improve the health of individual & community
  10. 10. CLICK HERE TO DOWNLOAD THIS PPT https://userupload.net/j72hszhboqcp
  11. 11. 11 Use of statistics in dental sciences Assess the state of oral health in community Indicate basic factors underlying state of oral health Determine success or failure of specific oral health care programmes or to evaluate the programme action Promote health legislation and in creating administrative standards for oral health
  12. 12. 12 Populations and Parameters  Population – a group of individuals that we would like to know something about  Parameter - a characteristic of the population in which we have a particular interest  Examples:  The proportion of the population that would respond to a certain drug  The association between a risk factor and a disease in a population
  13. 13. 13 Samples and Statistics  Sample – a subset of a population (hopefully representative)  Statistic – a characteristic of the sample  Examples:  The observed proportion of the sample that responds to treatment  The observed association between a risk factor and a disease in this sample
  14. 14. 14 Populations and Samples  Studying populations is too expensive and time-consuming, and thus impractical  If a sample is representative of the population, then by observing the sample we can learn something about the population  And thus by looking at the characteristics of the sample (statistics), we may learn something about the characteristics of the population (parameters).
  15. 15. CLICK HERE TO DOWNLOAD THIS PPT https://userupload.net/j72hszhboqcp
  16. 16. 16
  17. 17. 17 Sample size Extent to which sample population represents general population  Type of study i.e. descriptive, experimental etc.  Variability of population (expressed as S.D.)  No. of variables  Level of precision  Sensitivity of measurement tools  Sampling method employed  Data analysis techniques A sample will be representative if all members of the population have an equal chance of being picked.  
  18. 18. 18
  19. 19. 19 Random :chance of population unit being selected in sample Probability sampling Selection of unit by chance only Applicable when – small population , homogenous , readily available To ensure randomness – lottery method Table of random numbers Simple random sampling
  20. 20. 20 Simple Random Sampling A simple random sample of 20 cases 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
  21. 21. 21
  22. 22. 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  23. 23. 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
  24. 24. 24 Systematic random sampling Used in cases where a complete list of population available Applied to field studies K = sample interval K = total population/ sample size desired Adv – simple Less time & labor Results fairly accurate
  25. 25. 25 Systematic Random sample of 20 cases 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
  26. 26. 26 Stratified sampling Target population divided into homogenous groups or classes called strata Strata – age , sex , classes , geographical area More representative sample Greater accuracy Covers wide area
  27. 27. 27 Stratified Random Sampling
  28. 28. 28 Cluster sampling Cluster is a randomly selected group Units of population in natural groups or clusters Simple method , less time and cost Higher error
  29. 29. 29 Example:  Imagine that you wanted to conduct in-person interviews with neighborhood organizations. There are 9 cities scattered around the country with the relevant types of organizations, and 16 organizations within each of the 9 cities (or 144 total organizations).  You need to interview 12 organizations.  A simple random sample would likely require interviews in (and this travel to) these 9 distant cities:
  30. 30. 30  If you used multi-stage clustered sampling, you would first randomly select a certain number of cities (here three), and then randomly select four organizations within each of the three cities.  This saves travel time, and also makes it easier to assemble a sampling frame (a list of the ultimate sampling elements).
  31. 31. 31 Cluster sampling  used where (1) no sampling frame directly available, and/or (2) simple random sampling would be expensive, complex, time-consuming and/or logistically difficult.    for each level (sampling unit), take a random sample of each, and then a random sample within that larger "cluster", etc. (Since this process involves more than one stage or step of sampling, it is often called "Multistage Cluster Sampling".
  32. 32. 32 Errors in sampling Sampling errors faulty sample design small sample size Non sampling errors coverage error observational error processing error
  33. 33. 33 What is data?  Pieces of information  Fraenkel & Wallen (2000)  the term “data” refers to the kinds of information researchers obtain on the subjects of their research.  The vast majority of errors in research arise from a poor planning (e.g., data collection)  Fancy statistical methods cannot rescue garbage data.  Collect exact values whenever possible.
  34. 34. 34 Where do you get your data?  Collective recording of observations is data  Main sources  experiments, surveys , records [ census , public reports]  Demographic data- details of population D a t a Q u a n t it a t iv e Q u a lit a t iv e D is c r e t e C o n t in u o u s
  35. 35. 35 Level of Measurement Nominal - categorical  gender, race, hypertensive Ordinal - categories that can be ranked  none, light, moderate, heavy smoker Interval - continuous  blood pressure, age, days in the hospital Discrete – fixed values
  36. 36. 36 Horse race example Nominal  Did this horse come in first place?  0=no, 1=yes Ordinal  In what position did this horse finish?  1=first, 2=second, 3=third, etc. Interval (scale)  How long did it take for this horse to finish?  60 seconds, etc.
  37. 37. 37 Presentation of data Data collected & compiled from experimental work , surveys , records –raw data Needs to be sorted & classified To make it simple ,concise ,meaningful , interesting & helpful 2 methods  tabulation  diagrams / drawings
  38. 38. 38 Visual Data Summaries Quantitative/ continuous / measured data  Histogram  Frequency polygon  Frequency curve  Line chart/ graph  Cumulative frequency diagram  Scatter / dot diagram Qualitative/ discrete / counted data  Bar diagram  Pie/sector diagram  Pictogram  Map diagram / spot diagram
  39. 39. 39 Tabulation Tables – devices …presentation of data 1st step ….. Before analysis/interpretation Rules for frequency distribution table  Each table shld contain title n no-Table1,Table2….  Headings …rows & columns clear n concise  No. of class interval b/w 5-25  Class interval of equal width  Units of measurements specified  Source of data mentioned  Groups tabulated in order
  40. 40. 40 Classes (standard) No. of students 1st 68 2nd 65 3rd 63 4th 62 5th 60 Table1 students in a primary school Table 2
  41. 41. 41 Bar diagram Represent only one variable Represent qualitative data Compare qualitative data with respect to single variable
  42. 42. 42 Proportional bar diagram Comparison of data Populations or groups compared with respect to single variable Compare only the proportion of subgroups
  43. 43. 43 Line diagram / graph Simplest mean to represent data Useful in representing trends over time X –axis represent time Y –axis , value of any variable under study
  44. 44. 44 Histogram Depict quantitative data of continuous type
  45. 45. 45 Frequency polygon Represents frequency distributions Comparative analysis Area diagram developed over a histogram Point marked over mid point of class interval
  46. 46. 46 Cartograms or spot maps Used to show geographical distribution of frequency
  47. 47. 47 Pictogram or picture diagram To impress the frequency of occurrence of health related events
  48. 48. 48 Pie diagram / Sector diagram Show percentage breakdown Degrees of angle denote frequency and area of sector Angle = class frequency/total observation x 360
  49. 49. 49 Summary Measures Central Tendency Mean Median Mode Summary Measures Variation Variance Standard Deviation Range
  50. 50. 50 Describing-Central tendency refers to the Middle of the Distribution Value or parameter which serves as single estimate of a series of data Mental picture of central value Enables comparison One central value around which all other observations are dispersed
  51. 51. 51 Mean (Arithmetic Mean) The most common measure of central tendency Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5 Mean = 6
  52. 52. 52 Median Robust measure of central tendency Not affected by extreme values In an ordered array, the median is the “middle” number  If n or N is odd, the median is the middle number  If n or N is even, the median is the average of the two middle numbers 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Median = 5
  53. 53. 53 Mode Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
  54. 54. 54 mean .
  55. 55. 55 median
  56. 56. 56 mode
  57. 57. 57  Dr A = 2,4,3,4,6,6,2,5  Dr B = 4,5,4,3,4,5,3,4  Dr C = 3,3,8,3,3,3,4,5  Mean x¯Dr A = 32/8 = 4 days  Mean x¯Dr B = 32/8 = 4 days  Mean x¯Dr C = 32/8 = 4 days  Range of the days varies  Dr A = 2-6 days  Dr B = 3-5 days  Dr C = 3-8 days This ranges r known as Measures of dispersion
  58. 58. 58 Measures of Variation Variation VarianceStandard Deviation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range
  59. 59. 59 The Range  Measure of variation  Difference between the largest and the smallest observations:  Ignores the way in which data are distributed Largest SmallestRange X X= − 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5
  60. 60. 60 ( ) 2 2 1 N i i X N µ σ = − = ∑ Shows variation about the mean (x-x¯)  Dr A = -2,0,-1,0, 2,2,-2,1 = 0  Dr b = 0,1,0,-1,0,1,-1,0 = 0  Dr c = -1, -1, 4,-1,-1,-1,-1,0 = 0 (x-x¯)2  Dr A = 18, Dr B = 4 , Dr C = 22 Thus, Dr A =18/8 = 2.25 Dr B = 4/8 = 0.5 Dr C = 22/8 = 2.75 ( ) 2 2 1 1 n i i X X S n = − = − ∑ Variance Population variance: Sample variance:
  61. 61. 61 Standard Deviation  Most important measure of variation  Shows variation about the mean  Root Mean Square Deviation  So for Dr A = 1.5  Dr B = 0.7  Dr C = 1.66  Has the same units as the original data  Sample standard deviation:  Population standard deviation: ( ) 2 1 1 n i i X X S n = − = − ∑ ( ) 2 1 N i i X N µ σ = − = ∑
  62. 62. 62 Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C
  63. 63. 63 Shape of a Distribution  Describes how data is distributed  Measures of shape  Symmetric or skewed Mean = Median =ModeMean < Median < Mode Mode < Median < Mean Right-SkewedLeft-Skewed Symmetric
  64. 64. 64 Frequency distribution--Normal Curve  Many statistics assume the normal, bell-shaped curve distribution for scores.  A distribution with this nature is normal distribution / Gaussian distribution  50% > mean; 50% < mean  Normal curve for population (height, weight)  Mean=median=mode  Mean + 1SD/34.13% of the score  Mean – 1SD/34.13% of the score  Mean +/- 3SD = more than 99% of the score
  65. 65. 65 Skewed Distribution  Non-symmetrical distribution  Mean, median, mode not the same  Negatively skewed extreme scores at the lower end  Mean < median <mode  most did well, a few poorly  Positively skewed  at the higher end  Mean >median >mode  Most did poorly, a few well  The further apart the mean and median, the more the distribution is skewed.
  66. 66. 66 Examples of Normal and Skewed 44-DAYS IN ICU 70.0 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 44-DAYS IN ICU Frequency 1000 800 600 400 200 0 Std. Dev = 3.99 Mean = .9 N = 933.00 35-SYSTOLIC BLOOD PRESSURE FIRST ER 250.0 240.0 230.0 220.0 210.0 200.0 190.0 180.0 170.0 160.0 150.0 140.0 130.0 120.0 110.0 100.0 90.0 80.0 70.0 60.0 35-SYSTOLIC BLOOD PRESSURE FIRST ER Frequency 160 140 120 100 80 60 40 20 0 Std. Dev = 27.74 Mean = 146.9 N = 925.00
  67. 67. 67 Hypothesis Tests Hypothesis testing is always a five- step procedure:  Formulation of the null and the alternative hypotheses  Specification of the level of significance  Calculation of the test statistic  Definition of the region of rejection  Selection of the appropriate hypothesis
  68. 68. 68 The simplest case for a decision is the 'yes-or- no' question. For any parameter to be tested two hypothesis are made Null hypothesis or hypothesis of no difference  Asserts that there is no real difference in sample & general population  The difference found is accidental & arises out of sampling variations  Alternative hypothesis of significant difference  States that sample result is different than the hypothetical value of population  To minimize errors the sampling distribution or area under normal curve is divided into two regions or zones  Zone of acceptance : mean +-1.96 SE  Zone of rejection
  69. 69. 69
  70. 70. 70 Types of Error
  71. 71. 71 Degree of freedom Defined as number of independent numbers in sample X +Y + Z /3 = 5 When there are 10 values , 9 choices or degrees of freedom
  72. 72. 72 Standard Error Standard deviation of a statistic like mean , proportion etc Diff samples from same population have diff mean Variability of such mean’s is assessed Standard error of mean = SD of means of several sample from same population SE = SD of obser in the sample No of obser in the sample Variation in biological observation
  73. 73. 73 Probability or chance Defined as relative frequency or probable chances of occurrence with which an event is expected to occur on an average Denoted relative frequency or odds Expressed as ‘p’ Range zero (0) – one (1)  when p= 0 no chance of event happening When p=1 , 100% p = no of events occurring / total no of trials q = negative probability
  74. 74. 74 What does Not Significant really mean? An impossible even has probability 0 An event which must occur has probability 1  P < 0.001 very highly significant  P < 0.01 Highly significant  P < 0.05 Significant Measure on a scale 0 10.5 0.750.25 Event Impossible Event Unlikely happen Event = like happen Event certain
  75. 75. 75 Tests of Significance Whenever 2 sets of observation have been compared, it becomes essential to find whether the diff observation b/w the 2 groups is bcos of sampling variation/ any other factor Method – Tests of Significance
  76. 76. 76 How to know what to use There are many theoretical distributions, both continuous and discrete. We use 4 of these a lot: z (unit normal), t, chi-square, and F. Z and t are closely related to the sampling distribution of means; chi- square and F / ANOVA are closely related to the sampling distribution of variances.
  77. 77. 77 Objective of using tests of significance To compare – sample mean with population Means of two samples Sample proportion with population Proportion of two samples Association b/w two attributes
  78. 78. 78 One-Sided vs. Two-Sided Tests One-sided tests have one rejection region, i.e. you check whether the parameter of interest is larger (or smaller) than a given value. Two-sided tests are used when we test a parameter for equivalence to a certain value. Deviations from that value in both directions are rejected.
  79. 79. 79 Z test large samples Large samples ( > 30) Difference observed b/w sample estimate and that of population is expressed in terms of SE Score of value of ratio b/w the observed difference & SE is called ‘Z’ Z = diff in means / SE of mean
  80. 80. 80 What is a t Test? Commonly Used Definition: Comparing two means to see if they are significantly different from each other Technical Definition: Any statistical test that uses the t family of distributions
  81. 81. 81 t-Test Small Samples  Designed by W.S Gossett  Used in case of small samples  Ratio of observed difference b/w means of two small samples to the SE of difference in same  When each individual gives a pair of observations , to test for difference in pair of values , paired ‘t’ test utilized.
  82. 82. 82 Student’s t-test Used to compare the average (mean) in one group with the average in another group. Univariate, Unmatched, Interval, Normal, 2 groups.  Eg 6 boys on diet A- 4,3,5,2,3,1 9 boys on diet B- 6,3,8,9,5,3,4,2,5 x=6 y= 9 SD – 2.04 Test the significance of diff in diet A n B with regards to their effect on inc in weight ?
  83. 83. 83 Paired t-test Used to compare the average for measurements made twice within the same person - before vs. after. For example, Did the systolic blood pressure change significantly from the scene of the injury to admission? Univariate, Matched, Interval, Normal, 2 groups.
  84. 84. 84
  85. 85. 85 Chi square test ( χ² test )  The most commonly used statistical test.  Developed by Karl Pearson  Used for qualitative data  To test whether the difference in distribution of attributes in different groups is due to sampling variation or otherwise.  For example, suppose that in a study of 933 patients with a hip fracture, 10% of the men (22/219) of the men develop pneumonia compared with 5% of the women (36/714). What is the probability that this could happen by chance alone?
  86. 86. 86 Calculation of χ² value χ² = (observed f – expected f )²ΣΣ Expected f Expected f = row total x column total / grand total Group No Of cavities new total 0-1 2-3 4-5 Who rec instr 30 15 5 50 Who did not rec inst 20 15 15 50 Total 50 30 20 100
  87. 87. 87 Group No Of cavities new total 0-1 2-3 4-5 Who rec instr 50x50/ 100= 25 30x50 / 100= 15 20x50/ 100= 10 50 Who did not rec inst 50x50/ 100= 25 30x50/ 100= 15 20x50/ 100= 10 50 Total 50 30 20 100 1+0+2.5=1=0+2.5=7² =χ Df = (2-1) x (3-1) = 2
  88. 88. 88
  89. 89. 89 Two-Sample F-Test to compare two methods, it is often important to know whether the variabilities for both methods are the same. In order to compare two variances v1, and v2…calculate the ratio of the two variances. This ratio is called the F-statistic F = v1/v2
  90. 90. 90
  91. 91. 91 Analysis of variance (ANOVA)  Compare more than two samples  Compares variation between the classes as well as within the classes  For such comparisons there is high chance of error using t or Z test One-way used to compare more than 3 means from independent groups. “Is the age different between White, Black, Hispanic patients?” Two-way used to compare 2 or more means by 2 or more factors. “Is the age different between Males and Females, With and Without Pnuemonia?”
  92. 92. 92 Coefficient of Correlation  Measures the strength of the linear relationship between two quantitative variables  Denoted by letter ‘r’  Ranges between –1 and 1  The closer to –1, the stronger the negative linear relationship  The closer to 1, the stronger the positive linear relationship  The closer to 0, the weaker any positive linear relationship
  93. 93. 93 Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6 r = 0 r = .6 r = 1
  94. 94. 94 Calculation of correlation coefficient Pearson’s correlation coefficient  r = Σ (X – x) (Y-y) √ Σ (X –x)² Σ (Y- y)² Does not prove whether one variable alone cause the change in other
  95. 95. 95 Overview of Biostatistics Research question Continuous Discrete 1. Describe 1 sample Mean , SD , SE Counts, % , proportion 2. Compare 2 groups a. Non paired Student’s t- test Chi2 test b. Paired Paired t test Confidence interval b/w 2 proportion 3. Compare 2 or more groups ANOVA F- test 4.Correlate 2 variables in 1 grp Pearson correlation r 5.Correlate > 2 variables in 1 grp Multiple correlation coefficient R
  96. 96. 96 ….ConclusionKnow thyself Why does he keep saying this all the time?
  97. 97. 97 “He who accepts statistics indiscriminately, will often be duped unnecessarily. But he who distrusts statistics, indiscriminately will often be ignorant, unnecessarily.”
  98. 98. 98 List of References Primer of biostatistics – Stanton A Glantz; 4th edi Park’s Textbook of Preventive and Social medicine; 17th edi Methods in Biostatistics – BK Mahajan; 6th edi An introduction to Biostatistics – PSS Sundar Rao; 3rd edi Essentials of Preventive and Community dentistry – Soben Peter; 2nd edi Jong’s Community Dental Health – George M Gluck; 5th edi

×