SlideShare uma empresa Scribd logo
1 de 80
Descriptive Statistics-I
   Dr Mahmoud Alhussami
Biostatistics
 What is the biostatistics? A branch of
  applied math. that deals with collecting,
  organizing and interpreting data using
  well-defined procedures.
 Types of Biostatistics:
       Descriptive Statistics. It involves organizing,
        summarizing & displaying data to make them
        more understandable.
       Inferential Statistics. It reports the degree of
        confidence of the sample statistic that predicts
        the value of the population parameter.
Data Interpretation
     Consideration:
1.    Accuracy
     1.   critical view of the data
     2.   investigating evidence of the results
     3.   consider other studies’ results
     4.   peripheral data analysis
     5.   conduct power analysis: type I & type II

                                  True         False
                     True
                                    Correct          Type-II

                                   Type -I           Correct
                       False
Types of Errors
If You……          When the Null            Then You
                  Hypothesis is…           Have…….
Reject the null   True    (there really    Made a Type I
hypothesis        are no difference)       Error
Reject the null   False   (there really    ☻
hypothesis        are difference)

Accept the null   False    (there really   Made Type II
hypothesis        are difference)          Error
Accept the null   True   (there really     ☻
hypothesis        are no difference)
 alpha : the level of significance used for
  establishing type-I error
 β : the probability of type-II error
 1 – β : is the probability of obtaining

  significance results ( power)
   Effect size: how much we can say that the
    intervention made a significance difference
2. Meaning of the results
    - translation of the results and make it
     understandable
3. Importance:
     - translation of the significant findings into
     practical findings
4. Generalizability:
      - how can we make the findings useful for all
     the population
5. Implication:
      - what have we learned related to what has
     been used during study
Definition
 Data is any type of information
 Raw data is a data collected as they
  receive.
 Organize data is data organized either in
  ascending, descending or in a grouped
  data.




March 28, 2013                                7
Data Example
Weight in pounds of 57 school children at a
  day-care center:
68       63       42   27   30   36   28   32   79   27
22       23       24   25   44   65   43   23   74   51
36       42       28   31   28   25   45   12   57   51
12       32       49   38   42   27   31   50   38   21
16       24       69   47   23   22   43   27   49   28
23       19       46   30   43   49   12




 March 28, 2013                                      8
Descriptive Statistics
   In this lecture we discuss using descriptive
    statistics, as opposed to inferential statistics.
   Here we are interested only in summarizing the
    data in front of us, without assuming that it
    represents anything more.
   We will look at both numerical and categorical
    data.
   We will look at both quantitative and graphical
    techniques.
   The basic overall idea is to turn data into
    information.
Numerical Data-Single Variable
   Numerical data may be summarized according to
    several characteristics. We will work with each in turn
    and then all of them together.
       Measures of Location
            Measures of central tendency: Mean; Median; Mode
            Measures of noncentral tendency - Quantiles
                Quartiles; Quintiles; Percentiles
       Measures of Dispersion
            Range
            Interquartile range
            Variance
            Standard Deviation
            Coefficient of Variation
       Measures of Shape
       Skewness
   Standardizing Data
Measures of Location
   Measures of location place the data set on the scale of
    real numbers.
   It refers to the location of a typical data value-the data
    value around which other scores to cluster.
   Measures of central tendency (i.e., central location) help
    find the approximate center of the dataset.
   Researchers usually do not use the term average,
    because there are three alternative types of average.
   These include the mean, the median, and the mode. In a
    perfect world, the mean, median & mode would be the
    same.
   However, the world is not perfect & very often, the
    mean, median and mode are not the same
Summation Notation
   Ways to summarize the data other than using
    tables and plots.
   Setup: dataset consists of {8,2,3,5}
         These numbers are denoted as xi’s
            x1=8                x2=2            x3=3             x4=5
                     4
            Sum=   ∑x
                    i =1
                           i   = x1 + x2 + x3 + x4 = 8 + 2 + 3 + 5 = 18

            Where the Greek letter Σ is the summation sign.




March 28, 2013                                                            12
13
Notes on Sample Mean X
                                       n
   Formula
                                      ∑X       i
                                X=     i=1
                                           n
   Summation Sign
         In the formula to find the mean, we use the “summation sign” — ∑
         This is just mathematical shorthand for “add up all of the observations”


                   n

                  ∑X
                  i=1
                        i   = X 1 + X 2 + X 3 + ....... + X n




March 28, 2013                                                                  14
Notes on Sample Mean
   Also called sample average or arithmetic
    mean
   Mean for the sample = X or M, Mean for
    population = mew (μ)
   Sensitive to extreme values
         One data point could make a great change in
          sample mean
   Why is it called the sample mean?
         To distinguish it from population mean


March 28, 2013                                          15
16
 The median is the middle value of the ordered
  data
 To get the median, we must first rearrange the
  data into an ordered array (in ascending or
  descending order). Generally, we order the
  data from the lowest value to the highest
  value.
 Therefore,  the median is the data value such
  that half of the observations are larger and half
  are smaller. It is also the 50th percentile.
 If n is odd, the median is the middle
  observation of the ordered array. If n is even, it
  is midway between the two central
  observations.
                                                       17
Example:                          0   2   3   5   20   99   100


 Note: Data has been ordered from lowest to highest. Since n is
 odd (n=7), the median is the (n+1)/2 ordered observation, or the
 4th observation.
Answer: The median is 5.

The mean and the median are unique for a given set
of data. There will be exactly one mean and one
median.

Unlike the mean, the median is not affected by
extreme values.
 Q: What happens to the median if we change the 100 to 5,000?
 Not a thing, the median will still be 5. Five is still the middle
 value of the data set.

                                                                     18
10   20      30        40        50   60
Example:

 Note: Data has been ordered from lowest to highest.
 Since n is even (n=6), the median is the (n+1)/2
 ordered observation, or the 3.5th observation, i.e., the
 average of observation 3 and observation 4.


Answer: The median is 35.




                                    Descriptive Statistics I             19
20
Mean vs. Median
   Advantage: The median is less affected by extreme
    values.
   Disadvantage:
         The median takes no account of the precise magnitude of
          most of the observations and is therefore less efficient
          than the mean
         If two groups of data are pooled the median of the
          combined group can not be expressed in terms of the
          medians of the two original groups but the sample mean
          can.
                                  n1 x1 + n2 x2
                     x pooled =
                                     n1 + n2


March 28, 2013                                                  21
The  mode is the value of the data that
 occurs with the greatest frequency.
 Unstable index: values of modes tend to
 fluctuate from one sample to another
 drawn from the same population
Example. 1, 1, 1, 2, 3, 4, 5
 Answer. The mode is 1 since it occurs three times.
 The other values each appear only once in the data
 set.
Example. 5, 5, 5, 6, 8, 10, 10, 10.
 Answer. The mode is: 5, 10.
 There are two modes. This is a bi-modal dataset.     22
 The mode is different from the mean and the
 median in that those measures always exist
 and are always unique. For any numeric data
 set there will be one mean and one median.
 The   mode may not exist.
  Data:   1, 2, 3, 4, 5, 6, 7, 8, 9, 0
  Here  you have 10 observations and they are all
   different.
 The   mode may not be unique.
  Data:   0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7
  Mode= 1, 2, 3, 4, 5, and 6. There are six
   modes.
                                                      23
Comparison of the Mode, the Median,
and the Mean
   In a normal distribution, the mode , the median,
    and the mean have the same value.
   The mean is the widely reported index of central
    tendency for variables measured on an interval
    and ratio scale.
   The mean takes each and every score into
    account.
   It also the most stable index of central tendency
    and thus yields the most reliable estimate of the
    central tendency of the population.
Comparison of the Mode, the Median,
and the Mean
   The mean is always pulled in the direction of the
    long tail, that is, in the direction of the extreme
    scores.
   For the variables that positively skewed (like
    income), the mean is higher than the mode or
    the median. For negatively skewed variables (like
    age at death) the mean is lower.
   When there are extreme values in the distribution
    (even if it is approximately normal), researchers
    sometimes report means that have been adjusted
    for outliers.
   To adjust means one must discard a fixed
    percentage (5%) of the extreme values from
    either end of the distribution.
Shapes of Distributions
      Symmetric (Right and left sides are
       mirror images)
            Left tail looks like right tail
            Mean = Median = Mode




                       Mean    Median     Mode

March 28, 2013                                   26
Distribution Characteristics
 Mode: Peak(s)
 Median: Equal areas point
 Mean: Balancing point




                 Mode            Mean
                        Median
March 28, 2013                          27
Shapes of Distributions
   Right skewed (positively skewed)
         Long right tail
         Mean > Median




                 Mode            Mean
                        Median
March 28, 2013                          28
Shapes of Distributions
   Left skewed (negatively skewed)
         Long left tail
         Mean < Median




                    Mean   Median   Mode
March 28, 2013                             29
SPSS Functions
   If you are given the “raw data” (i.e. data
    not summarized in a frequency table) you
    can calculate the measures of location in
    SPSS.
         Descriptive Command
         Option Command
         Selection of mean, median and mode




March 28, 2013                                   30
Measuresof non-central location used to
 summarize a set of data

Examples      of commonly used quantiles:
  Quartiles
  Quintiles
  Deciles
  Percentiles




                            Descriptive Statistics I   31
   Quartiles split a set of ordered data into four parts.
       Imagine cutting a chocolate bar into four equal pieces… How many
        cuts would you make? (yes, 3!)
   Q1 is the First Quartile
       25% of the observations are smaller than Q1 and 75% of the
        observations are larger
   Q2 is the Second Quartile
       50% of the observations are smaller than Q2 and 50% of the
        observations are larger. Same as the Median. It is also the 50th
        percentile.
   Q3 is the Third Quartile
       75% of the observations are smaller than Q3and 25% of the
        observations are larger


                                              Descriptive Statistics I     32
   A quartile, like the median, either takes the value of one of the
    observations, or the value halfway between two observations.
       If n/4 is an integer, the first quartile (Q1) has the value halfway between the
        (n/4)th observation and the next observation.
       If n/4 is not an integer, the first quartile has the value of the observation whose
        position corresponds to the next highest integer.




       The method we are using is an approximation. If you solve this in MS Excel,
        which relies on a formula, you may get an answer that is slightly different.

                                                        Descriptive Statistics I              33
34
 Similar
        to what we just learned about quartiles,
 where 3 quartiles split the data into 4 equal
 parts,
  There  are 9 deciles dividing the distribution into
   10 equal portions (tenths).
  There are four quintiles dividing the population
   into 5 equal portions.
  … and 99 percentiles (next slide)
 Inall these cases, the convention is the same.
 The point, be it a quartile, decile, or percentile,
 takes the value of one of the observations or it
 has a value halfway between two adjacent
 observations. It is never necessary to split the
 difference between two observations more                   35
                                 Descriptive Statistics I
 We use 99 percentiles to divide a data set
 into 100 equal portions.

 Percentilesare used in analyzing the results
 of standardized exams. For instance, a
 score of 40 on a standardized test might
 seem like a terrible grade, but if it is the 99th
 percentile, don’t worry about telling your
 parents. ☺

 Which   percentile is Q1? Q2 (the median)?
  Q3?
 We will always use computer software to
  obtain the percentiles.                            36
Data (n=16):
1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 10
Compute the mean, median, mode, quartiles.

Answer.
1 1 2 2 ┋ 2 2 3 3 ┋ 4 4 5 5 ┋ 6 7 8 10
 Mean = 65/16 = 4.06
 Median = 3.5
 Mode = 2
 Q1 = 2
 Q2 = Median = 3.5
 Q3 = 5.5
                                                  37
Data – number of absences (n=13) :
0, 5, 3, 2, 1, 2, 4, 3, 1, 0, 0, 6, 12
Compute the mean, median, mode,
quartiles.

Answer. First order the data:
0, 0, 0,┋ 1, 1, 2, 2, 3, 3, 4,┋ 5, 6, 12
 Mean = 39/13 = 3.0 absences
 Median = 2 absences
 Mode = 0 absences
 Q1 = .5 absences
 Q3 = 4.5 absences
                           Descriptive Statistics I   38
Data: Reading Levels of 16 eighth graders.
5, 6, 6, 6, 5, 8, 7, 7, 7, 8, 10, 9, 9, 9, 9, 9

Answer. First, order the data:
5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
 Sum=120.
 Mean= 120/16 = 7.5 This is the average reading
 level of the 16 students.
 Median = Q2 = 7.5
 Q1 = 6, Q3 = 9
 Mode = 9
                             Descriptive Statistics I   39
5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
 Alternate method: The data can also be set up as
 a frequency distribution.
                                 Reading Level -                   Frequency -
                                              Xi                             fi
                                                               5             2
                                                               6             3
 Measures can be                                               7             3
 computed using methods                                        8             2

 for grouped data.                                             9             5
                                                              10             1
                                                                            16

Note that the sum of the frequencies, ∑f i = n = 16
                                   Descriptive Statistics I                       40
Xi   fi (Xi)(fi)
                            5    2      10
                            6    3      18
                            7    3      21
                            8    2      16
                            9    5      45
                           10    1      10
                                16     120




Descriptive Statistics I                      41
 It refers to how spread out the scores are.
 In other words, how similar or different
  participants are from one another on the
  variable. It is either homogeneous or
  heterogeneous sample.
 Why do we need to look at measures of
  dispersion?
 Consider this example:
 A company is about to buy computer chips that
 must have an average life of 10 years. The
 company has a choice of two suppliers. Whose
 chips should they buy? They take a sample of 10
 chips from each of the suppliers and test them.
 See the data on the next slide.
                                                   42
We see that supplier B’s chips have a longer average life.

However, what if the company offers
a 3-year warranty?

Then, computers manufactured
using the chips from supplier A
will have no returns
while using supplier B will result in
4/10 or 40% returns.



                                                             43
Normal Distribution

            Standard Deviation
                                 Standard Deviation




     Mean                        Mean
We will study these five measures of
dispersion
   Range
   Interquartile Range
   Standard Deviation
   Variance
   Coefficient of Variation
   Relative Standing.




                                        45
The Range
   Is the simplest measure of variability, is the
    difference between the highest score and the
    lowest score in the distribution.
   In research, the range is often shown as the
    minimum and maximum value, without the
    abstracted difference score.
   It provides a quick summary of a distribution’s
    variability.
   It also provides useful information about a
    distribution when there are extreme values.
   The range has two values, it is highly unstable.
Range    = Largest Value – Smallest Value
 Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30
 Answer: Range = 30 – 1 = 29.
Pros:
    Easy to calculate
Cons:
    Value of range is only determined by two
     values
    The interpretation of the range is difficult.
    One problem with the range is that it is
     influenced by extreme values at either end.
                                                     47
48
Descriptive Statistics I   49
Standard Deviation
   The smaller the standard deviation, the better is
    the mean as the summary of a typical score. E.g.
    10 people weighted 150 pounds, the SD would be
    zero, and the mean of 150 would communicate
    perfectly accurate information about all the
    participants wt. Another example would be a
    heterogeneous sample 5 people 100 pounds and
    another five people 200 pounds. The mean still
    150, but the SD would be 52.7.
   In normal distribution there are 3 SDs above the
    mean and 3 SDs below the mean.
March 28, 2013   51
Xi   Yi
1     0
2     0
3     0
4     5
5    10


          52
53
54
55
Relationship between SD and
frequency distribution




      SD   -2.5 -2 -1   mean    +1 +2 +2.5


                          68%
                         95%
                        99%
Standard Normal Distribution
                                            95% of
                                            probability in
                                            here

      2.5% of probability                             2.5% of probability
      in here                                         in here




                                 Standard Normal
                            Distribution with 95% area
                                      marked



March 28, 2013                                                              57
 The problem with s2 and s is that they are both, like the
  mean, in the “original” units.
 This makes it difficult to compare the variability of two
  data sets that are in different units or where the
  magnitude of the numbers is very different in the two
  sets. For example,
       Suppose you wish to compare two stocks and one is in dollars and the other is in
        yen; if you want to know which one is more volatile, you should use the
        coefficient of variation.
       It is also not appropriate to compare two stocks of vastly different prices even if
        both are in the same units.
       The standard deviation for a stock that sells for around $300 is going to be very
        different from one with a price of around $0.25.
   The coefficient of variation will be a better measure of
    dispersion in these cases than the standard deviation
    (see example on the next slide).
                                                                                              58
Descriptive Statistics I   59
Which stock is more volatile?
Closing prices over the last 8 months:                        Stock A   Stock B
                                                    JAN         $1.00      $180
        $1.62                                       FEB          1.50       175
                                                    MAR          1.90       182
        $1.70
CVA =             x 100% = 95.3%                    APR
                                                    MAY
                                                                  60.
                                                                 3.00
                                                                            186
                                                                            188
                                                    JUN           40.       190
                                                     JUL         5.00       200
         $11.33                                     AUG           20.       210

        $188.88                                    Mean         $1.70   $188.88
                                                      s2         2.61    128.41
CVB =             x 100% = 6.0%                        s        $1.62    $11.33




Answer: The standard deviation of B is higher than
for A, but A is more volatile:
                                   Descriptive Statistics I                       60
61
Relative Standing
 It provides information about the position
  of an individual score value within a
  distribution scores.
 Two types:
       Percentile Ranks.
       Standard Scores.
Percentile Ranks
   It is the percentage of scores in the distribution that fall at
    or below a given value.
   A percentile is a score value above which and below which
    a certain percentage of value in a distribution fall.
   P = Number of scores less than a given score divided by
    total number scores X 100.
    E.g. suppose you received a score of 90 on a test given to a
    class of 50 people. Of your classmates, 40 had scores lower
    than 90. P = 40/50 X 100 = 80. YOU achieved a higher
    score than 80% of the people who took the test, which also
    means that almost 20% who took the test did better than
    you.
   Percentiles are symbolized by the letter P, with a subscript
    indicating the percentage below the score value. Hence, P 60
    refers to the 60th percentile and stands for the score below
    which 60% of values fall.
Percentile Ranks
 The statement P40= 55 means that 40%
  of the values in the distribution fall below
  the score 55.
 The are several interpercentile measures
  of variability. The most common being the
  Interquartile range (IQR).
IQR
 The Interquartile range (IQR) is the score
  at the 75th percentile or 3rd quartile (Q3)
  minus the score at the 25th percentile or
  first quartile (Q1). Are the most used to
  define outliers.
 It is not sensitive to extreme values.
 IQR   = Q3 – Q1
 Example (n = 15):
  0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98
  Q1 = 3, Q3 = 22
 IQR = 22 – 3 = 19    (Range = 98)
 This is basically the range of the central 50%
  of the observations in the distribution.
 Problem: The Interquartile range does not
  take into account the variability of the total
  data (only the central 50%). We are
  “throwing out” half of the data.



                                                        66
Standard Scores
 There are scores that are expressed in
  terms of their relative distance from the
  mean. It provides information not only
  about rank but also distance between
  scores.
 It often called Z-score.
Z Score
Is a standard score that indicates how
 many SDs from the mean a particular
 values lies.
 Z = Score of value – mean of scores
 divided by standard deviation.
Standard Normal Scores
 How many standard deviations away from
  the mean are you?
                     Observation – mean
Standard Score (Z) =
                     Standard deviation


“Z” is normal with mean 0 and standard
  deviation of 1.



March 28, 2013                            69
March 28, 2013   70
Standard Normal Scores
   A standard score of:
    Z = 1: The observation lies one SD
      above the mean
    Z = 2: The observation is two SD
      above the mean
    Z = -1: The observation lies 1 SD
      below the mean
    Z = -2: The observation lies 2 SD
      below the mean



March 28, 2013                            71
Standard Normal Scores
   Example: Male Blood Pressure,
    mean = 125, s = 14 mmHg
         BP = 167 mmHg


                      167 − 125
                  Z=            = 3.0
         BP = 97 mmHg   14
                     97 −125
                 Z=            = −2.0
                        14


March 28, 2013                          72
What is the Usefulness of a Standard
?Normal Score
      It tells you how many SDs (s) an observation is
       from the mean
      Thus, it is a way of quickly assessing how
       “unusual” an observation is
      Example: Suppose the mean BP is 125 mmHg,
       and standard deviation = 14 mmHg
                Is 167 mmHg an unusually high measure?
                If we know Z = 3.0, does that help us?




March 28, 2013                                            73
Descriptive Statistics II   74
Descriptive Statistics II   75
Original data             Change 7 to 97                Change 23 to 93
 X               Z              X           Z                  X           Z

 65          0.45-              65          0.81-             65       1.40-

          Data: Exam Scores
 73          0.11-              73          0.38-             73       0.79-
 78           0.10              78          0.10-             78       0.40-   
 69          0.28-              69          0.60-             69       1.09-
 78           0.10              78          0.10-             78       0.40-
  7          2.89-    =>        97           0.94             97        1.07
 23          2.21-              23          3.12-   =>        93        0.76
 98           0.94              98           0.99             98        1.14
 99           0.99              99           1.05             99        1.22
 99           0.99              99           1.05             99        1.22
 97           0.90              97           0.94             97        1.07
 99           0.99              99           1.05             99        1.22
 75          0.02-              75          0.27-             75       0.63-
 79           0.14              79          0.05-             79       0.32-
 85           0.40              85           0.28             85        0.14
 63          0.53-              63          0.92-             63       1.56-
 67          0.36-              67          0.70-             67       1.25-
 72          0.15-              72          0.43-             72       0.86-
 73          0.11-              73          0.38-             73       0.79-
 93           0.73              93           0.72             93        0.76
 95           0.82              95           0.83             95        0.91

Mean         75.57             Mean        79.86             Mean      83.19
 s           23.75              s          18.24              .s       12.96




                     Descriptive Statistics II                                     76
 No  matter what you are measuring, a Z-score
  of more than +5 or less than – 5 would indicate
  a very, very unusual score.
 For standardized data, if it is normally
  distributed, 95% of the data will be between
  ±2 standard deviations about the mean.
 If the data follows a normal distribution,
  95% of the data will be between -1.96 and +1.96.
  99.7% of the data will fall between -3 and +3.
  99.99% of the data will fall between -4 and +4.

 Worst
      case scenario: 75% of the data are
 between 2 standard deviations about the mean.

                                                      77
2      3   8        8   9   10     10       12          15    18   22     63


                   Q1           Median
Smallest                                                     Q3
                                                                            Largest




                                Descriptive Statistics II                             78
Major reasons for using Index
((descriptive statistics
   Understanding the data.
   Evaluating outliers. Outliers are often identified in
    relation to the value of a distribution’s IQR. A
    mild outlier is a data value that lies between 1.5
    and 3 times the IQR below Q1 or above Q3.
    Extreme outlier is a data value that is more that
    three times the IQR below Q1 or above Q3.
   Describe the research sample.
   Answering research questions.
Descriptive Statistics-I: Measures of Central Tendency

Mais conteúdo relacionado

Mais procurados

The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-testChristina K J
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
Probability And Probability Distributions
Probability And Probability Distributions Probability And Probability Distributions
Probability And Probability Distributions Sahil Nagpal
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.pptNursing Path
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyChie Pegollo
 

Mais procurados (20)

The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
z-scores
z-scoresz-scores
z-scores
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Probability And Probability Distributions
Probability And Probability Distributions Probability And Probability Distributions
Probability And Probability Distributions
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
probability
probabilityprobability
probability
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Linear regression
Linear regression Linear regression
Linear regression
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.ppt
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Z test
Z testZ test
Z test
 

Destaque

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsBurak Mızrak
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsRamachandra Barik
 
Percentage Rank
Percentage RankPercentage Rank
Percentage RankEn Em
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statisticsRamachandra Barik
 
Webquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAWebquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAsrthomas
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Descriptiongetyourcheaton
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)jasondeveau
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statisticsRamachandra Barik
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsL H
 
Descriptive Statistics In Excel
Descriptive Statistics In ExcelDescriptive Statistics In Excel
Descriptive Statistics In Exceldinesh takyar
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.Miharbi Ignasm
 
Pediatric ecg learning with quiz
Pediatric ecg learning with quizPediatric ecg learning with quiz
Pediatric ecg learning with quizkiranvs123
 
Pediatric EKGs
Pediatric EKGsPediatric EKGs
Pediatric EKGsbpesin
 
02 descriptive statistics
02 descriptive statistics02 descriptive statistics
02 descriptive statisticsVasant Kothari
 

Destaque (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statistics
 
Percentage Rank
Percentage RankPercentage Rank
Percentage Rank
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statistics
 
Webquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAWebquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAA
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Description
 
Presentation1
Presentation1Presentation1
Presentation1
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statistics
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Descriptive Statistics In Excel
Descriptive Statistics In ExcelDescriptive Statistics In Excel
Descriptive Statistics In Excel
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.
 
Pediatric ecg learning with quiz
Pediatric ecg learning with quizPediatric ecg learning with quiz
Pediatric ecg learning with quiz
 
Pediatric EKGs
Pediatric EKGsPediatric EKGs
Pediatric EKGs
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
02 descriptive statistics
02 descriptive statistics02 descriptive statistics
02 descriptive statistics
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 

Semelhante a Descriptive Statistics-I: Measures of Central Tendency

Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdfUnit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdfAravindS199
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
3 descritive statistics measure of central tendency variatio
3 descritive statistics measure of   central   tendency variatio3 descritive statistics measure of   central   tendency variatio
3 descritive statistics measure of central tendency variatioLama K Banna
 
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_HappinessUnit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happinessourbusiness0014
 
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...Sundar B N
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptxCallplanetsDeveloper
 
Measures of disease in epidemiology
Measures of disease in epidemiologyMeasures of disease in epidemiology
Measures of disease in epidemiologyOsmanmohamed38
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendencyrenz50
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for researchHarve Abella
 
Essentiality of mathematics
Essentiality of mathematicsEssentiality of mathematics
Essentiality of mathematicsAzhar Hussain
 
Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central TendencyBasudev Sharma
 
Statistics digital text book
Statistics digital text bookStatistics digital text book
Statistics digital text bookdeepuplr
 

Semelhante a Descriptive Statistics-I: Measures of Central Tendency (20)

Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdfUnit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
 
Data analysis
Data analysisData analysis
Data analysis
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
3 descritive statistics measure of central tendency variatio
3 descritive statistics measure of   central   tendency variatio3 descritive statistics measure of   central   tendency variatio
3 descritive statistics measure of central tendency variatio
 
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_HappinessUnit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
 
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
Measures of Central Tendency, Measures of Position, Measures of Dispersion, S...
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
Measures of disease in epidemiology
Measures of disease in epidemiologyMeasures of disease in epidemiology
Measures of disease in epidemiology
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Stat11t chapter3
Stat11t chapter3Stat11t chapter3
Stat11t chapter3
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Q.t
Q.tQ.t
Q.t
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for research
 
Essentiality of mathematics
Essentiality of mathematicsEssentiality of mathematics
Essentiality of mathematics
 
statistics
statisticsstatistics
statistics
 
Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central Tendency
 
Statistics digital text book
Statistics digital text bookStatistics digital text book
Statistics digital text book
 

Mais de Mohammad Ihmeidan

Anatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractAnatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractMohammad Ihmeidan
 
Medical History taking form introductory course
Medical History taking form introductory courseMedical History taking form introductory course
Medical History taking form introductory courseMohammad Ihmeidan
 
Male reproductive system - Anatomy
Male reproductive system - AnatomyMale reproductive system - Anatomy
Male reproductive system - AnatomyMohammad Ihmeidan
 
The cystoscope and accessories
The cystoscope  and accessoriesThe cystoscope  and accessories
The cystoscope and accessoriesMohammad Ihmeidan
 
Physicochemistry of renal stones
Physicochemistry of renal stonesPhysicochemistry of renal stones
Physicochemistry of renal stonesMohammad Ihmeidan
 
Hepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaHepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaMohammad Ihmeidan
 
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,  ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser, Mohammad Ihmeidan
 
نحن نعيش في عالم
نحن نعيش في عالمنحن نعيش في عالم
نحن نعيش في عالمMohammad Ihmeidan
 
اللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمياللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلميMohammad Ihmeidan
 

Mais de Mohammad Ihmeidan (20)

Anatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractAnatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tract
 
Fusion prostatic biopsy
Fusion prostatic biopsyFusion prostatic biopsy
Fusion prostatic biopsy
 
Renal cell carcinoma
Renal cell carcinomaRenal cell carcinoma
Renal cell carcinoma
 
Medical History taking form introductory course
Medical History taking form introductory courseMedical History taking form introductory course
Medical History taking form introductory course
 
Male reproductive system - Anatomy
Male reproductive system - AnatomyMale reproductive system - Anatomy
Male reproductive system - Anatomy
 
Testicular tumors
Testicular tumors Testicular tumors
Testicular tumors
 
Vesico ureteral reflux
Vesico ureteral reflux Vesico ureteral reflux
Vesico ureteral reflux
 
Wilms tumors
Wilms tumorsWilms tumors
Wilms tumors
 
Bladder injuries
Bladder injuriesBladder injuries
Bladder injuries
 
Urethral trauma
Urethral traumaUrethral trauma
Urethral trauma
 
The cystoscope and accessories
The cystoscope  and accessoriesThe cystoscope  and accessories
The cystoscope and accessories
 
Physicochemistry of renal stones
Physicochemistry of renal stonesPhysicochemistry of renal stones
Physicochemistry of renal stones
 
Postobstructive diuresis
Postobstructive diuresisPostobstructive diuresis
Postobstructive diuresis
 
Renal physiology
Renal physiology Renal physiology
Renal physiology
 
Peri-operative Anaphylaxis
Peri-operative Anaphylaxis Peri-operative Anaphylaxis
Peri-operative Anaphylaxis
 
Hepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaHepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasia
 
Hepatic artery varients
Hepatic artery varientsHepatic artery varients
Hepatic artery varients
 
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,  ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 
نحن نعيش في عالم
نحن نعيش في عالمنحن نعيش في عالم
نحن نعيش في عالم
 
اللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمياللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمي
 

Descriptive Statistics-I: Measures of Central Tendency

  • 1. Descriptive Statistics-I Dr Mahmoud Alhussami
  • 2. Biostatistics  What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.  Types of Biostatistics:  Descriptive Statistics. It involves organizing, summarizing & displaying data to make them more understandable.  Inferential Statistics. It reports the degree of confidence of the sample statistic that predicts the value of the population parameter.
  • 3. Data Interpretation  Consideration: 1. Accuracy 1. critical view of the data 2. investigating evidence of the results 3. consider other studies’ results 4. peripheral data analysis 5. conduct power analysis: type I & type II True False True Correct Type-II Type -I Correct False
  • 4. Types of Errors If You…… When the Null Then You Hypothesis is… Have……. Reject the null True (there really Made a Type I hypothesis are no difference) Error Reject the null False (there really ☻ hypothesis are difference) Accept the null False (there really Made Type II hypothesis are difference) Error Accept the null True (there really ☻ hypothesis are no difference)
  • 5.  alpha : the level of significance used for establishing type-I error  β : the probability of type-II error  1 – β : is the probability of obtaining significance results ( power)  Effect size: how much we can say that the intervention made a significance difference
  • 6. 2. Meaning of the results - translation of the results and make it understandable 3. Importance: - translation of the significant findings into practical findings 4. Generalizability: - how can we make the findings useful for all the population 5. Implication: - what have we learned related to what has been used during study
  • 7. Definition  Data is any type of information  Raw data is a data collected as they receive.  Organize data is data organized either in ascending, descending or in a grouped data. March 28, 2013 7
  • 8. Data Example Weight in pounds of 57 school children at a day-care center: 68 63 42 27 30 36 28 32 79 27 22 23 24 25 44 65 43 23 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 69 47 23 22 43 27 49 28 23 19 46 30 43 49 12 March 28, 2013 8
  • 9. Descriptive Statistics  In this lecture we discuss using descriptive statistics, as opposed to inferential statistics.  Here we are interested only in summarizing the data in front of us, without assuming that it represents anything more.  We will look at both numerical and categorical data.  We will look at both quantitative and graphical techniques.  The basic overall idea is to turn data into information.
  • 10. Numerical Data-Single Variable  Numerical data may be summarized according to several characteristics. We will work with each in turn and then all of them together.  Measures of Location  Measures of central tendency: Mean; Median; Mode  Measures of noncentral tendency - Quantiles  Quartiles; Quintiles; Percentiles  Measures of Dispersion  Range  Interquartile range  Variance  Standard Deviation  Coefficient of Variation  Measures of Shape  Skewness  Standardizing Data
  • 11. Measures of Location  Measures of location place the data set on the scale of real numbers.  It refers to the location of a typical data value-the data value around which other scores to cluster.  Measures of central tendency (i.e., central location) help find the approximate center of the dataset.  Researchers usually do not use the term average, because there are three alternative types of average.  These include the mean, the median, and the mode. In a perfect world, the mean, median & mode would be the same.  However, the world is not perfect & very often, the mean, median and mode are not the same
  • 12. Summation Notation  Ways to summarize the data other than using tables and plots.  Setup: dataset consists of {8,2,3,5}  These numbers are denoted as xi’s x1=8 x2=2 x3=3 x4=5 4 Sum= ∑x i =1 i = x1 + x2 + x3 + x4 = 8 + 2 + 3 + 5 = 18 Where the Greek letter Σ is the summation sign. March 28, 2013 12
  • 13. 13
  • 14. Notes on Sample Mean X n  Formula ∑X i X= i=1 n  Summation Sign  In the formula to find the mean, we use the “summation sign” — ∑  This is just mathematical shorthand for “add up all of the observations” n ∑X i=1 i = X 1 + X 2 + X 3 + ....... + X n March 28, 2013 14
  • 15. Notes on Sample Mean  Also called sample average or arithmetic mean  Mean for the sample = X or M, Mean for population = mew (μ)  Sensitive to extreme values  One data point could make a great change in sample mean  Why is it called the sample mean?  To distinguish it from population mean March 28, 2013 15
  • 16. 16
  • 17.  The median is the middle value of the ordered data  To get the median, we must first rearrange the data into an ordered array (in ascending or descending order). Generally, we order the data from the lowest value to the highest value.  Therefore, the median is the data value such that half of the observations are larger and half are smaller. It is also the 50th percentile.  If n is odd, the median is the middle observation of the ordered array. If n is even, it is midway between the two central observations. 17
  • 18. Example: 0 2 3 5 20 99 100 Note: Data has been ordered from lowest to highest. Since n is odd (n=7), the median is the (n+1)/2 ordered observation, or the 4th observation. Answer: The median is 5. The mean and the median are unique for a given set of data. There will be exactly one mean and one median. Unlike the mean, the median is not affected by extreme values. Q: What happens to the median if we change the 100 to 5,000? Not a thing, the median will still be 5. Five is still the middle value of the data set. 18
  • 19. 10 20 30 40 50 60 Example: Note: Data has been ordered from lowest to highest. Since n is even (n=6), the median is the (n+1)/2 ordered observation, or the 3.5th observation, i.e., the average of observation 3 and observation 4. Answer: The median is 35. Descriptive Statistics I 19
  • 20. 20
  • 21. Mean vs. Median  Advantage: The median is less affected by extreme values.  Disadvantage:  The median takes no account of the precise magnitude of most of the observations and is therefore less efficient than the mean  If two groups of data are pooled the median of the combined group can not be expressed in terms of the medians of the two original groups but the sample mean can. n1 x1 + n2 x2 x pooled = n1 + n2 March 28, 2013 21
  • 22. The mode is the value of the data that occurs with the greatest frequency.  Unstable index: values of modes tend to fluctuate from one sample to another drawn from the same population Example. 1, 1, 1, 2, 3, 4, 5 Answer. The mode is 1 since it occurs three times. The other values each appear only once in the data set. Example. 5, 5, 5, 6, 8, 10, 10, 10. Answer. The mode is: 5, 10. There are two modes. This is a bi-modal dataset. 22
  • 23.  The mode is different from the mean and the median in that those measures always exist and are always unique. For any numeric data set there will be one mean and one median.  The mode may not exist.  Data: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0  Here you have 10 observations and they are all different.  The mode may not be unique.  Data: 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7  Mode= 1, 2, 3, 4, 5, and 6. There are six modes. 23
  • 24. Comparison of the Mode, the Median, and the Mean  In a normal distribution, the mode , the median, and the mean have the same value.  The mean is the widely reported index of central tendency for variables measured on an interval and ratio scale.  The mean takes each and every score into account.  It also the most stable index of central tendency and thus yields the most reliable estimate of the central tendency of the population.
  • 25. Comparison of the Mode, the Median, and the Mean  The mean is always pulled in the direction of the long tail, that is, in the direction of the extreme scores.  For the variables that positively skewed (like income), the mean is higher than the mode or the median. For negatively skewed variables (like age at death) the mean is lower.  When there are extreme values in the distribution (even if it is approximately normal), researchers sometimes report means that have been adjusted for outliers.  To adjust means one must discard a fixed percentage (5%) of the extreme values from either end of the distribution.
  • 26. Shapes of Distributions  Symmetric (Right and left sides are mirror images)  Left tail looks like right tail  Mean = Median = Mode Mean Median Mode March 28, 2013 26
  • 27. Distribution Characteristics  Mode: Peak(s)  Median: Equal areas point  Mean: Balancing point Mode Mean Median March 28, 2013 27
  • 28. Shapes of Distributions  Right skewed (positively skewed)  Long right tail  Mean > Median Mode Mean Median March 28, 2013 28
  • 29. Shapes of Distributions  Left skewed (negatively skewed)  Long left tail  Mean < Median Mean Median Mode March 28, 2013 29
  • 30. SPSS Functions  If you are given the “raw data” (i.e. data not summarized in a frequency table) you can calculate the measures of location in SPSS.  Descriptive Command  Option Command  Selection of mean, median and mode March 28, 2013 30
  • 31. Measuresof non-central location used to summarize a set of data Examples of commonly used quantiles:  Quartiles  Quintiles  Deciles  Percentiles Descriptive Statistics I 31
  • 32. Quartiles split a set of ordered data into four parts.  Imagine cutting a chocolate bar into four equal pieces… How many cuts would you make? (yes, 3!)  Q1 is the First Quartile  25% of the observations are smaller than Q1 and 75% of the observations are larger  Q2 is the Second Quartile  50% of the observations are smaller than Q2 and 50% of the observations are larger. Same as the Median. It is also the 50th percentile.  Q3 is the Third Quartile  75% of the observations are smaller than Q3and 25% of the observations are larger Descriptive Statistics I 32
  • 33. A quartile, like the median, either takes the value of one of the observations, or the value halfway between two observations.  If n/4 is an integer, the first quartile (Q1) has the value halfway between the (n/4)th observation and the next observation.  If n/4 is not an integer, the first quartile has the value of the observation whose position corresponds to the next highest integer.  The method we are using is an approximation. If you solve this in MS Excel, which relies on a formula, you may get an answer that is slightly different. Descriptive Statistics I 33
  • 34. 34
  • 35.  Similar to what we just learned about quartiles, where 3 quartiles split the data into 4 equal parts,  There are 9 deciles dividing the distribution into 10 equal portions (tenths).  There are four quintiles dividing the population into 5 equal portions.  … and 99 percentiles (next slide)  Inall these cases, the convention is the same. The point, be it a quartile, decile, or percentile, takes the value of one of the observations or it has a value halfway between two adjacent observations. It is never necessary to split the difference between two observations more 35 Descriptive Statistics I
  • 36.  We use 99 percentiles to divide a data set into 100 equal portions.  Percentilesare used in analyzing the results of standardized exams. For instance, a score of 40 on a standardized test might seem like a terrible grade, but if it is the 99th percentile, don’t worry about telling your parents. ☺  Which percentile is Q1? Q2 (the median)? Q3?  We will always use computer software to obtain the percentiles. 36
  • 37. Data (n=16): 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 10 Compute the mean, median, mode, quartiles. Answer. 1 1 2 2 ┋ 2 2 3 3 ┋ 4 4 5 5 ┋ 6 7 8 10 Mean = 65/16 = 4.06 Median = 3.5 Mode = 2 Q1 = 2 Q2 = Median = 3.5 Q3 = 5.5 37
  • 38. Data – number of absences (n=13) : 0, 5, 3, 2, 1, 2, 4, 3, 1, 0, 0, 6, 12 Compute the mean, median, mode, quartiles. Answer. First order the data: 0, 0, 0,┋ 1, 1, 2, 2, 3, 3, 4,┋ 5, 6, 12 Mean = 39/13 = 3.0 absences Median = 2 absences Mode = 0 absences Q1 = .5 absences Q3 = 4.5 absences Descriptive Statistics I 38
  • 39. Data: Reading Levels of 16 eighth graders. 5, 6, 6, 6, 5, 8, 7, 7, 7, 8, 10, 9, 9, 9, 9, 9 Answer. First, order the data: 5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10 Sum=120. Mean= 120/16 = 7.5 This is the average reading level of the 16 students. Median = Q2 = 7.5 Q1 = 6, Q3 = 9 Mode = 9 Descriptive Statistics I 39
  • 40. 5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10 Alternate method: The data can also be set up as a frequency distribution. Reading Level - Frequency - Xi fi 5 2 6 3 Measures can be 7 3 computed using methods 8 2 for grouped data. 9 5 10 1 16 Note that the sum of the frequencies, ∑f i = n = 16 Descriptive Statistics I 40
  • 41. Xi fi (Xi)(fi) 5 2 10 6 3 18 7 3 21 8 2 16 9 5 45 10 1 10 16 120 Descriptive Statistics I 41
  • 42.  It refers to how spread out the scores are.  In other words, how similar or different participants are from one another on the variable. It is either homogeneous or heterogeneous sample.  Why do we need to look at measures of dispersion?  Consider this example: A company is about to buy computer chips that must have an average life of 10 years. The company has a choice of two suppliers. Whose chips should they buy? They take a sample of 10 chips from each of the suppliers and test them. See the data on the next slide. 42
  • 43. We see that supplier B’s chips have a longer average life. However, what if the company offers a 3-year warranty? Then, computers manufactured using the chips from supplier A will have no returns while using supplier B will result in 4/10 or 40% returns. 43
  • 44. Normal Distribution Standard Deviation Standard Deviation Mean Mean
  • 45. We will study these five measures of dispersion  Range  Interquartile Range  Standard Deviation  Variance  Coefficient of Variation  Relative Standing. 45
  • 46. The Range  Is the simplest measure of variability, is the difference between the highest score and the lowest score in the distribution.  In research, the range is often shown as the minimum and maximum value, without the abstracted difference score.  It provides a quick summary of a distribution’s variability.  It also provides useful information about a distribution when there are extreme values.  The range has two values, it is highly unstable.
  • 47. Range = Largest Value – Smallest Value Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30 Answer: Range = 30 – 1 = 29. Pros:  Easy to calculate Cons:  Value of range is only determined by two values  The interpretation of the range is difficult.  One problem with the range is that it is influenced by extreme values at either end. 47
  • 48. 48
  • 50. Standard Deviation  The smaller the standard deviation, the better is the mean as the summary of a typical score. E.g. 10 people weighted 150 pounds, the SD would be zero, and the mean of 150 would communicate perfectly accurate information about all the participants wt. Another example would be a heterogeneous sample 5 people 100 pounds and another five people 200 pounds. The mean still 150, but the SD would be 52.7.  In normal distribution there are 3 SDs above the mean and 3 SDs below the mean.
  • 52. Xi Yi 1 0 2 0 3 0 4 5 5 10 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. Relationship between SD and frequency distribution SD -2.5 -2 -1 mean +1 +2 +2.5 68% 95% 99%
  • 57. Standard Normal Distribution 95% of probability in here 2.5% of probability 2.5% of probability in here in here Standard Normal Distribution with 95% area marked March 28, 2013 57
  • 58.  The problem with s2 and s is that they are both, like the mean, in the “original” units.  This makes it difficult to compare the variability of two data sets that are in different units or where the magnitude of the numbers is very different in the two sets. For example,  Suppose you wish to compare two stocks and one is in dollars and the other is in yen; if you want to know which one is more volatile, you should use the coefficient of variation.  It is also not appropriate to compare two stocks of vastly different prices even if both are in the same units.  The standard deviation for a stock that sells for around $300 is going to be very different from one with a price of around $0.25.  The coefficient of variation will be a better measure of dispersion in these cases than the standard deviation (see example on the next slide). 58
  • 60. Which stock is more volatile? Closing prices over the last 8 months: Stock A Stock B JAN $1.00 $180 $1.62 FEB 1.50 175 MAR 1.90 182 $1.70 CVA = x 100% = 95.3% APR MAY 60. 3.00 186 188 JUN 40. 190 JUL 5.00 200 $11.33 AUG 20. 210 $188.88 Mean $1.70 $188.88 s2 2.61 128.41 CVB = x 100% = 6.0% s $1.62 $11.33 Answer: The standard deviation of B is higher than for A, but A is more volatile: Descriptive Statistics I 60
  • 61. 61
  • 62. Relative Standing  It provides information about the position of an individual score value within a distribution scores.  Two types:  Percentile Ranks.  Standard Scores.
  • 63. Percentile Ranks  It is the percentage of scores in the distribution that fall at or below a given value.  A percentile is a score value above which and below which a certain percentage of value in a distribution fall.  P = Number of scores less than a given score divided by total number scores X 100. E.g. suppose you received a score of 90 on a test given to a class of 50 people. Of your classmates, 40 had scores lower than 90. P = 40/50 X 100 = 80. YOU achieved a higher score than 80% of the people who took the test, which also means that almost 20% who took the test did better than you.  Percentiles are symbolized by the letter P, with a subscript indicating the percentage below the score value. Hence, P 60 refers to the 60th percentile and stands for the score below which 60% of values fall.
  • 64. Percentile Ranks  The statement P40= 55 means that 40% of the values in the distribution fall below the score 55.  The are several interpercentile measures of variability. The most common being the Interquartile range (IQR).
  • 65. IQR  The Interquartile range (IQR) is the score at the 75th percentile or 3rd quartile (Q3) minus the score at the 25th percentile or first quartile (Q1). Are the most used to define outliers.  It is not sensitive to extreme values.
  • 66.  IQR = Q3 – Q1  Example (n = 15): 0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98 Q1 = 3, Q3 = 22 IQR = 22 – 3 = 19 (Range = 98)  This is basically the range of the central 50% of the observations in the distribution.  Problem: The Interquartile range does not take into account the variability of the total data (only the central 50%). We are “throwing out” half of the data. 66
  • 67. Standard Scores  There are scores that are expressed in terms of their relative distance from the mean. It provides information not only about rank but also distance between scores.  It often called Z-score.
  • 68. Z Score Is a standard score that indicates how many SDs from the mean a particular values lies.  Z = Score of value – mean of scores divided by standard deviation.
  • 69. Standard Normal Scores  How many standard deviations away from the mean are you? Observation – mean Standard Score (Z) = Standard deviation “Z” is normal with mean 0 and standard deviation of 1. March 28, 2013 69
  • 71. Standard Normal Scores A standard score of:  Z = 1: The observation lies one SD above the mean  Z = 2: The observation is two SD above the mean  Z = -1: The observation lies 1 SD below the mean  Z = -2: The observation lies 2 SD below the mean March 28, 2013 71
  • 72. Standard Normal Scores  Example: Male Blood Pressure, mean = 125, s = 14 mmHg  BP = 167 mmHg 167 − 125 Z= = 3.0  BP = 97 mmHg 14 97 −125 Z= = −2.0 14 March 28, 2013 72
  • 73. What is the Usefulness of a Standard ?Normal Score  It tells you how many SDs (s) an observation is from the mean  Thus, it is a way of quickly assessing how “unusual” an observation is  Example: Suppose the mean BP is 125 mmHg, and standard deviation = 14 mmHg  Is 167 mmHg an unusually high measure?  If we know Z = 3.0, does that help us? March 28, 2013 73
  • 76. Original data   Change 7 to 97   Change 23 to 93 X Z X Z X Z 65 0.45- 65 0.81- 65 1.40- Data: Exam Scores 73 0.11- 73 0.38- 73 0.79- 78 0.10 78 0.10- 78 0.40-  69 0.28- 69 0.60- 69 1.09- 78 0.10 78 0.10- 78 0.40- 7 2.89- => 97 0.94 97 1.07 23 2.21- 23 3.12- => 93 0.76 98 0.94 98 0.99 98 1.14 99 0.99 99 1.05 99 1.22 99 0.99 99 1.05 99 1.22 97 0.90 97 0.94 97 1.07 99 0.99 99 1.05 99 1.22 75 0.02- 75 0.27- 75 0.63- 79 0.14 79 0.05- 79 0.32- 85 0.40 85 0.28 85 0.14 63 0.53- 63 0.92- 63 1.56- 67 0.36- 67 0.70- 67 1.25- 72 0.15- 72 0.43- 72 0.86- 73 0.11- 73 0.38- 73 0.79- 93 0.73 93 0.72 93 0.76 95 0.82 95 0.83 95 0.91 Mean 75.57 Mean 79.86 Mean 83.19 s 23.75 s 18.24 .s 12.96 Descriptive Statistics II 76
  • 77.  No matter what you are measuring, a Z-score of more than +5 or less than – 5 would indicate a very, very unusual score.  For standardized data, if it is normally distributed, 95% of the data will be between ±2 standard deviations about the mean.  If the data follows a normal distribution,  95% of the data will be between -1.96 and +1.96.  99.7% of the data will fall between -3 and +3.  99.99% of the data will fall between -4 and +4.  Worst case scenario: 75% of the data are between 2 standard deviations about the mean. 77
  • 78. 2 3 8 8 9 10 10 12 15 18 22 63 Q1 Median Smallest Q3 Largest Descriptive Statistics II 78
  • 79. Major reasons for using Index ((descriptive statistics  Understanding the data.  Evaluating outliers. Outliers are often identified in relation to the value of a distribution’s IQR. A mild outlier is a data value that lies between 1.5 and 3 times the IQR below Q1 or above Q3. Extreme outlier is a data value that is more that three times the IQR below Q1 or above Q3.  Describe the research sample.  Answering research questions.