SlideShare uma empresa Scribd logo
1 de 73
Elementary Statistics

       Davis Lazarus
    Assistant Professor
  ISIM, The IIS University
Too few categories

                               Age of Spring 1998 Stat 250 Students

                          60
      Frequency (Count)




                          50

                          40

                          30

                          20

                          10

                          0

                                    18           23         28
                                           Age (in years)
n=92 students
Too many categories


                                GPAs of Spring 1998 Stat 250 Students

                            7

                            6
        Frequency (Count)




                            5

                            4

                            3

                            2

                            1

                            0

                                  2               3               4
                                                GPA
n=92 students
•Scatter Plot
    75                   •Scatter diagram
Y   70
                         •Scattergram
    65
    60
    55
    50
    45
    40
    35
    30
         30   40   50   60      70           80
                                         X
Classes      Class            Tally Marks       Freq.   x
           boundaries


 70 – 78   69.5 – 78.5   /////                      5   74
 61 – 69   60.5 – 69.5   /////                      5   65
 52 – 60   51.5 – 60.5                          0       56
 43 – 51   42.5 – 51.5   //                       2     47
 34 – 42   33.5 – 42.5   /////-//                 7     38
 25 – 33   24.5 – 33.5   /////-/////-////       14      29
 16 – 24   15.5 – 24.5   /////-/////-/////-//    17     20
A frequency distribution
 table lists categories of
  scores along with their
corresponding frequencies.
The frequency for a
 particular category or
 class is the number of
original scores that fall
     into that class.
The classes or
categories refer to the
    groupings of a
   frequency table
• The range is the difference
  between the highest value
  and the lowest value.


R = highest value – lowest value
The class width is the
  difference between two
consecutive lower class limits
    or class boundaries.
The class limits are the
  smallest or the largest
 numbers that can actually
belong to different classes.
• Lower class limits are the
  smallest numbers that can
  actually belong to the different
  classes.
• Upper class limits are the
  largest numbers that can
  actually belong to the different
  classes.
• The class boundaries are obtained by
  increasing the upper class limits and
  decreasing the lower class limits by the
  same amount so that there are no gaps
  between consecutive under classes. The
  amount to be added or subtracted is ½
  the difference between the upper limit of
  one class and the lower limit of the
  following class.
Essential Question :

• How do we construct a
  frequency distribution
  table?
Process of Constructing
  a Frequency Table
• STEP 1: Determine the

                 range.

R = Highest Value – Lowest Value
• STEP 2. Determine the
  tentative number of classes (k)

       k = 1 + 3.322 log N

• Always round – off
• Note: The number of classes should be between
  5 and 20. The actual number of classes may be
  affected by convenience or other subjective
  factors
• STEP 3. Find the class width
  by dividing the range by the
  number of classes.
                  Range              R
class width =                   ⇔ c=
              number of classes      k

           (Always round – off )
• STEP 4. Write the classes or
  categories starting with the
  lowest score. Stop when the class
  already includes the highest
  score.
• Add the class width to the starting point to get the
  second lower class limit. Add the class width to the
  second lower class limit to get the third, and so on. List
  the lower class limits in a vertical column and enter the
  upper class limits, which can be easily identified at this
  stage.
• STEP 5. Determine the
  frequency for each class by
  referring to the tally columns
  and present the results in a
  table.
When constructing frequency
 tables, the following guidelines
       should be followed.
• The classes must be mutually
  exclusive. That is, each score
  must belong to exactly one
  class.
• Include all classes, even if the
  frequency might be zero.
• All classes should have the
  same width, although it is
  sometimes impossible to avoid
  open – ended intervals such as
  “65 years or older”.
• The number of classes should
  be between 5 and 20.
Let’s Try!!!
•    Time magazine collected
    information on all 464 people who
    died from gunfire in the Philippines
    during one week. Here are the ages
    of 50 men randomly selected from
    that population. Construct a
    frequency distribution table.
19   18   30   40 41 33 73 25
23   25   21   33 65 17 20 76
47   69   20    31 18 24 35 24
17   36   65    70 22 25 65 16
24   29   42    37 26 46 27 63
21   27   23    25 71 37 75 25
27   23
Using Table:
• What is the lower class limit
  of the highest class? Upper
  class limit of the lowest class?
• Find the class mark of the
  class 43 – 51.
• What is the frequency of the
  class 16 – 24?
Classes      Class            Tally Marks       Freq.   x
           boundaries


 70 – 78   69.5 – 78.5   /////                      5   74
 61 – 69   60.5 – 69.5   /////                      5   65
 52 – 60   51.5 – 60.5                          0       56
 43 – 51   42.5 – 51.5   //                       2     47
 34 – 42   33.5 – 42.5   /////-//                 7     38
 25 – 33   24.5 – 33.5   /////-/////-////       14      29
 16 – 24   15.5 – 24.5   /////-/////-/////-//    17     20
Example 1
The manager of Hudson Auto would like to have a better
understanding of the cost of parts used in the engine
tune-ups performed in the shop.
She examines 50 customer invoices
for tune-ups. The costs of parts,
rounded off to the nearest dollar,
are listed on the next slide.

91     78    93   57     75   52     99    80    97      62
71     69    72   89     66   75     79    75    72      76
104    74    62   68     97   105    77    65    80      109
85     97    88   68     83   68     71    69    67      74
62     82    98   101    79   105    79    69    62      73
CUMULATIVE FREQUENCY
         DISTRIBUTION
• The less than cumulative frequency
  distribution (F<) is constructed by adding the
  frequencies from the lowest to the highest
  interval while the more than cumulative
  frequency distribution (F>) is constructed by
  adding the frequencies from the highest class
  interval to the lowest class interval.
Tabular Summary
 Frequency Distribution of
     engine tune-ups                          Cumulative Frequency

Cost ($) Frequency Relative Frequency less than         more than

 50-59     2                 0.04                 2           50
 60-69     13                0.26                15           48
                                     2 + 13
 70-79     16                0.32                31           35

 80-89     7                 0.14                38    5 + 7 18
 90-99     7                 0.14                45          12

100-109    5                 0.10                50           5
           50                1.00
                                 45 tune-ups    12 tune-ups
                                 cost less      cost more
                                 than $ 100     than $ 89
Graphical Summary: Histogram

            18
            16
            14
Frequency




            12
            10
             8
             6
             4
             2
                 50-59 60-69 70-79 80-89 90-99 100-110   Cost ($)

Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
Ogive
                                  less than ogive
             50

             40
 Frequency




             30

             20
                                          more than ogive
             10

                                                    Tune-up
                  60   70   80   90   100 110       Cost ($)



                                      median
Stem-and-Leaf Display
                  5   2 7
                  6   2 2 2 2 5 6 7 8 8 8 9 9 9
                  7   1   1   2   2   3 4 4 5 5 5 6 7 8 9 9 9
                  8   0   0   2   3   5 8 9
                  9   1   3   7   7   7 8 9
                 10   1   4   5   5   9

       a stem
                                            a leaf

A single digit is used to define each leaf
Leaf units may be 100, 10, 1, 0.1, and so on
Where the leaf unit is not shown, it is assumed to equal 1
In the above example, the leaf unit was 1
Leaf Unit = 0.1    8.6   11.7   9.4   9.1   10.2   11.0   8.8

   8   6 8
   9   1 4
  10   2
  11   0 7


Leaf Unit = 10    1806 1717 1974 1791 1682 1910 1838


  16   8
  17   1 9
                   The 82 in 1682
  18   0 3
                   is rounded down
  19   1 7         to 80 and is
                   represented as an 8
Measures of Central Tendency
Arithmetic Mean, Weighted Mean, Geometric Mean,
Median, Mode, Partition Values – Quartiles, Deciles and
Percentiles

Measures of Dispersion
Range, Mean deviation, Standard deviation, Variance,
Co-efficient of variation

Measures of Position
Quartile deviation
• What is the “location” or “centre” of the data? (measures
  of location or central tendency)
• How do the data vary? (measures of variability or
  dispersion)


Mean: the average obtained by finding the sum of the
numbers and dividing by the number of numbers in the sum.

Median: When the numbers are listed from highest to lowest
or lowest to highest, the median is the average number found
in the middle. If there are an even number of data, find the
average of the middle two numbers.

Mode: The number that occurs the most often.
Mean is the most widely used measure of location and
shows the central value of the data.

                      µ is thepopulation mean
 µ=
    ∑Xi               N is the population size
                      Xi is a particular population value
           N          Σ indicates the operation of adding


      ΣX
       xi               µ is thesample mean
  X =                   n is the sample size
       n                xi is a particular sample value

   •   all values are used
   •   unique
   •   sum of the deviations from the mean is 0
   •   affected by unusually large or small data values
The   Median is the midpoint of the values after they
have been ordered from the smallest to the largest.

For an even set of values, the median will be the
arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.


There are as many values above the median as below it
in the data array.

   unique
   not affected by extremely large or small values
⇒ good measure of location when such values occur
The   Mode is another measure of location and represents
the value of the observation that appears most frequently.


 Data can have more than one mode.
 If it has two modes, it is referred to as bimodal, three
 modes, trimodal, and the like.
Weighted Mean of a set of numbers X , X , ..., X ,1   2      n

with corresponding weights w1, w2, ...,wn

            ( w1 X 1 + w2 X 2 + ... + wn X n )
       Xw =
                    ( w1 + w2 + ...wn )


Geometric Mean of a set of n numbers is
defined as the nth root of the product of the n numbers.


  GM = n ( X 1)( X 2 )( X 3)...( Xn )

   GM is used to average percents, indexes, and relatives.
Example 1


 The interest rate on three bonds were 5, 21, and 4 percent.
 The arithmetic mean is (5+21+4) / 3 =10.0
 The geometric mean is


 GM = 3 (5)(21)(4) = 7.49

  The GM gives a more conservative profit figure because
  it is not heavily weighted by the rate of 21%
Example 2
                                                     Grow th in Sales 1999-2004
Another use of GM
is to determine the                          50
percent increase in




                      Sales in Millions($)
                                             40
sales, production                            30
or other business
                                             20
or economic series
                                             10
from one time
                                             0
period to another.
                                                  1999   2000   2001   2002       2003   2004
                                                                   Year




                (Value at end of period)
  GM = n                                     −1
              (Value at beginning of period)
Example 3


  The total number of females enrolled in American
  colleges increased from 755,000 in 1992 to 835,000 in
  2000. That is, the geometric mean rate of increase is
  1.27%.


                    835,000
    GM = 8                  −1 = .0127
                    755,000
Measures of Dispersion

   •Range
   • Mean Deviation
   •Quartile Deviation
   •Standard Deviation
   •Variance
   •Co-efficient of Variation
Dispersion        30
refers to the
                  25
spread or
variability in    20

the data.         15

                  10

                  5                             mean
                  0
                       0   2    4     6     8   10     12




   Range = Largest value – Smallest value
Range                                        Example

 The following represents the current year’s Return on
 Equity of the 25 companies in an investor’s portfolio.

        -8.1        3.2         5.9        8.1       12.3
        -5.1        4.1         6.3        9.2       13.3
        -3.1        4.6         7.9        9.5       14.0
        -1.4        4.8         7.9        9.7       15.0
         1.2        5.7         8.0       10.3       22.1

 Highest value: 22.1      Lowest value: -8.1

  Range = Highest value – lowest value
        = 22.1-(-8.1)
        = 30.2
Mean Deviation
The arithmetic mean of the absolute values of the
deviations from the arithmetic mean.

                                   All values are used

 M D =         Σ X - X              in the calculation.

                   n               Itis not unduly
                                    influenced by large
                                    or small values.
                                   The absolute values
                                    are difficult to
                                    manipulate.
Example 5


 The weights of a sample of crates containing books for
 the bookstore (in pounds ) are: 103, 97, 101, 106, 103

    X = 102




            ΣX −X        103 −102 + ... + 103 −102
  MD =               =
              n                        5
         1 + 5 +1 + 4 + 5
       =                  = 2.4
                5
Standard deviation and Variance
the arithmetic mean of     Standard deviation = √(variance)
the squared deviations
from the mean


                                   σ   2
                                           =   Σ (X - µ)2
           Population Variance
                                                   N
X is the value of an observation in the population
μ is the arithmetic mean of the population
N is the number of observations in the population

Population Standard Deviation, σ
Example 6

 In Example 4, the variance and standard deviation are:


σ   2
        =      Σ (X - µ)2
                   N
            ( - 8 .1 - 6 .6 2 ) 2 + ( - 5 .1 - 6 .6 2 ) 2 + ... + ( 2 2 .1 - 6 .6 2 ) 2
σ2=                                              25

σ2          = 4 2 .2 2 7                           σ == 6 . 4 9 8

                 Σ(X - X )           2         Sample variance

s2 =                n -1                     Sample standard deviation, s
Example 7

The hourly wages earned by a sample of five students are
$7, $5, $11, $8, $6.

        ΣX 37
    X =    =   = 7.40
         n   5
              Σ( X − X )     ( 7 − 7.4 ) +... + ( 6 − 7.4 )
                          2             2                     2
    s   2
            =              =
                  n −1                   5 −1
              21.2
            =       = 5.30
              5 −1


    s=            s   2
                          = 5.30 = 2.30
Example:
Data: X = {6, 10, 5, 4, 9, 8};      N=6
                                                 Mean:
      X          X−X             (X − X )    2


                                                 X=
                                                    ∑X       =
                                                               42
                                                                  =7
     6              -1              1                   N      6
    10               3              9            Variance:
      5             -2              4
                                                 s =
                                                  2    ∑ ( X − X )2
                                                                      =
                                                                        28
                                                                           = 4.67
      4             -3              9                       N           6
      9              2              4            Standard Deviation:
      8              1              1            s = s 2 = 4.67 = 2.16
  Total: 42                      Total: 28
Empirical Rule:
For any symmetrical, bell-shaped distribution


About 68% of the observations will lie within 1s the mean

About 95% of the observations will lie within 2s of the
mean

Nearly all the observations will be within 3s of the mean
Interpretation and Uses of the Standard Deviation




                            68%


                             95%
                            99.7%
      µ− 3σ   µ−2σ   µ−1σ    µ      µ+1σ   µ+2σ     µ+ 3 σ
Quartiles                            Q1, Q2, Q3 divides ranked
                                      data into four equal parts


        25%                     25%             25%               25%


                  Q1                      Q2                 Q3
                                          Fra
                                              cti
10 Deciles: D , D , D , D , D , D , D , D , D
              1    2        3    4    5

divides ranked data into ten equal parts
                                           6
                                                  les
                                                    7    8   9



10% 10% 10%       10% 10% 10%             10% 10% 10% 10%

   D1   D2    D3       D4       D5    D6       D7       D8   D9


 99 Percentiles: divides ranked data into 100 equal parts
Relative Standing
   Percentiles


percentile of value x = ((number of values < x)/ total number of
values)*100
        (round the result to the nearest whole number
   Suppose that in a class of 25 people we have the following averages
   (ordered in ascending order)


  42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89,
  91, 94, 98
  If you received a 77, what percentile are you?


    percentile of 77 = (12/25)*100 = 48
Relative Standing
Quartiles


   Instead of finding the percentile of a single data value as we did on
   the previous page, it is often useful to group the data into 4, or more,
   (nearly) equal groups. When grouping the data into four equal
   groupings, we call these groupings quartiles.




    Let        n = number of items in the data set
               k = percent desired (ex. k= 25)
               L = locator  the value separating the first k
               percent of the data from the rest



          L = (k/100) * n
Relative Standing
  Let’s separate the 25 class grades into four quartiles.


     •Step 1 – order the data in ascending order

42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89,
         L
91, 94, 98 25
                       Q1                    Q2                    Q3
Now find the 3 locators L25, L50, L75,

                                                            Round fraction part up
    L25 = (25/100) * 25 = 6.25                     7
                                                            to the next integer
    L50 = (50/100) * 25 = 12.5                     13
    L75 = (75/100) * 25 = 18.75
                                                       19
Relative Standing

 Other measures of relative standing
 include
 •Interquartile range (IQR) = Q3 - Q1
 •Semi-interquartile range = (Q3 - Q1)/ 2
 •Midquartile = (Q3 + Q1)/2
 •10 – 90 percentile range = P90 - P10

For the data on the previous page we have:


    IQR = 84 – 70 = 16
                                             Measures of variation
    Semi IQR = (84 – 70)/2 = 8
    Midquartile = (84 + 70)/2 = 77             Measure of central
                                               tendency
Box Diagram


      65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72, 73, 73, 73,       L25
      74, 74, 75, 75, 75, 75, 76, 76, 77, 77, 77, 77, 77, 77, 78,
                                                                        media
      78, 78, 78, 79, 79, 79, 79, 80, 81, 81, 81, 81, 81, 81, 81,       n
                                                                         L75
      81, 82, 82, 83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92


To construct a box diagram to illustrate the extent to which the
extreme data values lie beyond the interquartile range, draw a line
with the low and high value highlighted at the two ends. Mark the
gradations between these two extremes, then locate the quartile
boundaries Q1, Med., and Q3 on this line. Construct a box about
                                               Q1 = (73 + 74)/2 = 73.5
these values.
                       Q1                  M
                       Q3

 65         69           73           77           81              85    92
              89
number of scores less than a
Percentile of score a =                                  * 100
                             total number of scores



Relation between the different fractiles     D1 = P10
                                             D2 = P20
                          • Q1 = P25         D3 = P30
                          • Q2 = P50            •
                                                •
                          • Q3 = P75
                                                •
                                             D9 = P90

Interquartile Range:   Q 3 – Q1
Box plot             graphical display, based on quartiles,
                     that helps to picture a set of data.

  Five pieces of data are needed to construct a box plot:
Minimum Value,
First Quartile, Q1
                          The box represents the interquartile
Median,
                          range which contains the 50% of
Third Quartile, Q3
                          values.
Maximum Value.            The whiskers represent the range;
                          they extend from the box to the
                          highest and lowest values,
                          excluding outliers.
                          A line across the box indicates the
                          median.
Example 8
 Based on a sample of 20 deliveries, Buddy’s Pizza
 determined the following information. The minimum
 delivery time was 13 minutes and the maximum 30 minutes.
 The first quartile was 15 minutes, the median 18 minutes, and
 the third quartile 22 minutes. Develop a box plot for the
 delivery times.

               M in        Q            M e d ia n         Q                         M ax
                               1                               3
1.5 times the IQ range                                                  1.5 times the interquartile range




          12          14           16     18         20   22       24    26     28     30     32
Skewness
measurement of the lack of symmetry of the distribution.


Symmetric distribution: A distribution having the same
shape on either side of the centre


Skewed distribution: One whose shapes on either side of
the center differ; a nonsymmetrical distribution.


Can be positively or negatively skewed, or bimodal
Relative Positions of the Mean, Median, and
Mode in a Symmetric Distribution




                             M e a n
                            M e d ia n
                              M o d e
Relative Positions of the Mean, Median, and Mode in a Right
Skewed or Positively Skewed Distribution



                                       Mean > Median > Mode




  M o d e         M e a n
          M e d ia n
The Relative Positions of the Mean, Median, and Mode in a Left
Skewed or Negatively Skewed Distribution



  Mean < Median < Mode




                                          M e a n       M o d e
                                                M e d ia n
The coefficient of skewness can range from -3.00 up to 3.00


A value of 0 indicates a symmetric distribution.



Example 9

Using the twelve stock prices, we find the mean to be
84.42, standard deviation, 7.18, median, 84.5.

        3 ( X - Median )
 sk =                         = -.035
                 s
Kurtosis
• derived from the Greek word κυρτός, kyrtos or kurtos,
meaning bulging
• measure of the "peakedness" of the probability
distribution of a real-valued random variable
• higher kurtosis means more of the variance is due to
infrequent extreme deviations, as opposed to frequent
modestly-sized deviations.
distribution with positive kurtosis is called   leptokurtic,
or leptokurtotic.
In terms of shape, a leptokurtic distribution has a more acute
"peak" around the mean (that is, a higher probability than a
normally distributed variable of values near the mean) and
"fat tails" (that is, a higher probability than a normally
distributed variable of extreme values).

distribution with negative kurtosis is called   platykurtic,
or platykurtotic.
In terms of shape, a platykurtic distribution has a smaller
"peak" around the mean (that is, a lower probability than a
normally distributed variable of values near the mean) and
"thin tails" (that is, a lower probability than a normally
distributed variable of extreme values).
Other distribution – Leptokurtic




                     Normal distribution - Mesokurtic




                                   Normal distribution
                                   - Mesokurtic


Other distribution
– Platykurtic
Comparing Standard Deviations


Data A                                                         Mean =
                                                               15.5
11    12    13    14    15    16    17    18    19    20 21     s = 3.338

 Data B
                                                                Mean =
 11    12    13    14    15    16    17    18    19    20 21    15.5
                                                                 s = .9258

           Data C
                                                                Mean =
                                                                15.5
 11    12    13    14    15    16    17    18    19    20 21
                                                                 s = 4.57
Co-efficient of variation
• Measures relative variation                    S     
                                            CV =       ÷100%
• Always in percentage (%)                       X     
• Shows variation relative to mean
• Is used to compare two or more sets of data measured in different
  units

 When the mean value is near zero, the coefficient of
 variation is sensitive to change in the standard deviation,
 limiting its usefulness.
Stock A:
   Average price last year = $50
   Standard deviation = $5

               S             $5 
          CV =       ÷100% =      ÷100% = 10%
               X             $50 

Stock B:
   Average price last year = $100
   Standard deviation = $5

             S               $5 
        CV =         ÷100% =       ÷100% = 5%
             X               $100 

Mais conteúdo relacionado

Mais procurados

Enantiomeric impurities & Separation approaches.pptx
Enantiomeric impurities & Separation approaches.pptxEnantiomeric impurities & Separation approaches.pptx
Enantiomeric impurities & Separation approaches.pptxEfficiencyMyrsing
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 
Mean, median, and mode
Mean, median, and modeMean, median, and mode
Mean, median, and modeguest455435
 
Chapter 3: Prsentation of Data
Chapter 3: Prsentation of DataChapter 3: Prsentation of Data
Chapter 3: Prsentation of DataAndrilyn Alcantara
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Harve Abella
 
frequency distribution
 frequency distribution frequency distribution
frequency distributionUnsa Shakir
 
Geometry unit 2.2
Geometry unit 2.2Geometry unit 2.2
Geometry unit 2.2Mark Ryder
 
Method Validation - Limit of Detection, Quantitation limits and Robustness
Method Validation - Limit of Detection, Quantitation limits and RobustnessMethod Validation - Limit of Detection, Quantitation limits and Robustness
Method Validation - Limit of Detection, Quantitation limits and Robustnesslabgo
 
Norm referenced grading system
Norm referenced grading systemNorm referenced grading system
Norm referenced grading systemobemrosalia
 
presentation of data
presentation of datapresentation of data
presentation of dataChie Pegollo
 
Normal Curve and Standard Scores
Normal Curve and Standard ScoresNormal Curve and Standard Scores
Normal Curve and Standard ScoresJenewel Azuelo
 

Mais procurados (20)

Protein purification 2008
Protein purification  2008Protein purification  2008
Protein purification 2008
 
Enantiomeric impurities & Separation approaches.pptx
Enantiomeric impurities & Separation approaches.pptxEnantiomeric impurities & Separation approaches.pptx
Enantiomeric impurities & Separation approaches.pptx
 
Assessment compiled
Assessment compiledAssessment compiled
Assessment compiled
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Mean, median, and mode
Mean, median, and modeMean, median, and mode
Mean, median, and mode
 
Chapter 3: Prsentation of Data
Chapter 3: Prsentation of DataChapter 3: Prsentation of Data
Chapter 3: Prsentation of Data
 
The normal distribution
The normal distributionThe normal distribution
The normal distribution
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)
 
frequency distribution
 frequency distribution frequency distribution
frequency distribution
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
Item analysis2
Item analysis2Item analysis2
Item analysis2
 
Mode
ModeMode
Mode
 
Geometry unit 2.2
Geometry unit 2.2Geometry unit 2.2
Geometry unit 2.2
 
Continuous probability distribution
Continuous probability distributionContinuous probability distribution
Continuous probability distribution
 
Method Validation - Limit of Detection, Quantitation limits and Robustness
Method Validation - Limit of Detection, Quantitation limits and RobustnessMethod Validation - Limit of Detection, Quantitation limits and Robustness
Method Validation - Limit of Detection, Quantitation limits and Robustness
 
Norm referenced grading system
Norm referenced grading systemNorm referenced grading system
Norm referenced grading system
 
presentation of data
presentation of datapresentation of data
presentation of data
 
Normal Curve and Standard Scores
Normal Curve and Standard ScoresNormal Curve and Standard Scores
Normal Curve and Standard Scores
 
Data organization
Data organizationData organization
Data organization
 

Semelhante a Elementary statistics

Measures of Position - Elementary Statistics
Measures of Position - Elementary StatisticsMeasures of Position - Elementary Statistics
Measures of Position - Elementary StatisticsFlipped Channel
 
Presentation of data
Presentation of dataPresentation of data
Presentation of datamaryamijaz49
 
Ejercicio resuelto-de-estadc3adstica-descriptiva1
Ejercicio resuelto-de-estadc3adstica-descriptiva1Ejercicio resuelto-de-estadc3adstica-descriptiva1
Ejercicio resuelto-de-estadc3adstica-descriptiva1Sandra Hernández Cely
 
frequency distribution table
frequency distribution tablefrequency distribution table
frequency distribution tableMonie Ali
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of dataprince irfan
 
Final Lecture - 2.ppt
Final Lecture - 2.pptFinal Lecture - 2.ppt
Final Lecture - 2.pptssuserbe1d97
 
3_-frequency_distribution.pptx
3_-frequency_distribution.pptx3_-frequency_distribution.pptx
3_-frequency_distribution.pptxitzsudipto99
 
The more we get together
The more we get togetherThe more we get together
The more we get togetherLilian May Ando
 
measure of variability (windri). In research include example
measure of variability (windri). In research include examplemeasure of variability (windri). In research include example
measure of variability (windri). In research include examplewindri3
 
3.3 Measures of Variation
3.3 Measures of Variation3.3 Measures of Variation
3.3 Measures of Variationmlong24
 
lesson 3 presentation of data and frequency distribution
lesson 3 presentation of data and frequency distributionlesson 3 presentation of data and frequency distribution
lesson 3 presentation of data and frequency distributionNerz Baldres
 
Classification Systems
Classification SystemsClassification Systems
Classification SystemsJohn Reiser
 
Applied 40S March 25, 2009
Applied 40S March 25, 2009Applied 40S March 25, 2009
Applied 40S March 25, 2009Darren Kuropatwa
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Descriptiongetyourcheaton
 

Semelhante a Elementary statistics (20)

Measures of Position - Elementary Statistics
Measures of Position - Elementary StatisticsMeasures of Position - Elementary Statistics
Measures of Position - Elementary Statistics
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
Frequency distribution
Frequency distributionFrequency distribution
Frequency distribution
 
Ejercicio resuelto-de-estadc3adstica-descriptiva1
Ejercicio resuelto-de-estadc3adstica-descriptiva1Ejercicio resuelto-de-estadc3adstica-descriptiva1
Ejercicio resuelto-de-estadc3adstica-descriptiva1
 
frequency distribution table
frequency distribution tablefrequency distribution table
frequency distribution table
 
Practice test1 solution
Practice test1 solutionPractice test1 solution
Practice test1 solution
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Final Lecture - 2.ppt
Final Lecture - 2.pptFinal Lecture - 2.ppt
Final Lecture - 2.ppt
 
3_-frequency_distribution.pptx
3_-frequency_distribution.pptx3_-frequency_distribution.pptx
3_-frequency_distribution.pptx
 
Staisticsii
StaisticsiiStaisticsii
Staisticsii
 
Understanding data through presentation_contd
Understanding data through presentation_contdUnderstanding data through presentation_contd
Understanding data through presentation_contd
 
Statistics
StatisticsStatistics
Statistics
 
The more we get together
The more we get togetherThe more we get together
The more we get together
 
measure of variability (windri). In research include example
measure of variability (windri). In research include examplemeasure of variability (windri). In research include example
measure of variability (windri). In research include example
 
3.3 Measures of Variation
3.3 Measures of Variation3.3 Measures of Variation
3.3 Measures of Variation
 
lesson 3 presentation of data and frequency distribution
lesson 3 presentation of data and frequency distributionlesson 3 presentation of data and frequency distribution
lesson 3 presentation of data and frequency distribution
 
Classification Systems
Classification SystemsClassification Systems
Classification Systems
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Applied 40S March 25, 2009
Applied 40S March 25, 2009Applied 40S March 25, 2009
Applied 40S March 25, 2009
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Description
 

Elementary statistics

  • 1. Elementary Statistics Davis Lazarus Assistant Professor ISIM, The IIS University
  • 2. Too few categories Age of Spring 1998 Stat 250 Students 60 Frequency (Count) 50 40 30 20 10 0 18 23 28 Age (in years) n=92 students
  • 3. Too many categories GPAs of Spring 1998 Stat 250 Students 7 6 Frequency (Count) 5 4 3 2 1 0 2 3 4 GPA n=92 students
  • 4. •Scatter Plot 75 •Scatter diagram Y 70 •Scattergram 65 60 55 50 45 40 35 30 30 40 50 60 70 80 X
  • 5. Classes Class Tally Marks Freq. x boundaries 70 – 78 69.5 – 78.5 ///// 5 74 61 – 69 60.5 – 69.5 ///// 5 65 52 – 60 51.5 – 60.5 0 56 43 – 51 42.5 – 51.5 // 2 47 34 – 42 33.5 – 42.5 /////-// 7 38 25 – 33 24.5 – 33.5 /////-/////-//// 14 29 16 – 24 15.5 – 24.5 /////-/////-/////-// 17 20
  • 6. A frequency distribution table lists categories of scores along with their corresponding frequencies.
  • 7. The frequency for a particular category or class is the number of original scores that fall into that class.
  • 8. The classes or categories refer to the groupings of a frequency table
  • 9. • The range is the difference between the highest value and the lowest value. R = highest value – lowest value
  • 10. The class width is the difference between two consecutive lower class limits or class boundaries.
  • 11. The class limits are the smallest or the largest numbers that can actually belong to different classes.
  • 12. • Lower class limits are the smallest numbers that can actually belong to the different classes. • Upper class limits are the largest numbers that can actually belong to the different classes.
  • 13. • The class boundaries are obtained by increasing the upper class limits and decreasing the lower class limits by the same amount so that there are no gaps between consecutive under classes. The amount to be added or subtracted is ½ the difference between the upper limit of one class and the lower limit of the following class.
  • 14. Essential Question : • How do we construct a frequency distribution table?
  • 15. Process of Constructing a Frequency Table • STEP 1: Determine the range. R = Highest Value – Lowest Value
  • 16. • STEP 2. Determine the tentative number of classes (k) k = 1 + 3.322 log N • Always round – off • Note: The number of classes should be between 5 and 20. The actual number of classes may be affected by convenience or other subjective factors
  • 17. • STEP 3. Find the class width by dividing the range by the number of classes. Range R class width = ⇔ c= number of classes k (Always round – off )
  • 18. • STEP 4. Write the classes or categories starting with the lowest score. Stop when the class already includes the highest score. • Add the class width to the starting point to get the second lower class limit. Add the class width to the second lower class limit to get the third, and so on. List the lower class limits in a vertical column and enter the upper class limits, which can be easily identified at this stage.
  • 19. • STEP 5. Determine the frequency for each class by referring to the tally columns and present the results in a table.
  • 20. When constructing frequency tables, the following guidelines should be followed. • The classes must be mutually exclusive. That is, each score must belong to exactly one class. • Include all classes, even if the frequency might be zero.
  • 21. • All classes should have the same width, although it is sometimes impossible to avoid open – ended intervals such as “65 years or older”. • The number of classes should be between 5 and 20.
  • 22. Let’s Try!!! • Time magazine collected information on all 464 people who died from gunfire in the Philippines during one week. Here are the ages of 50 men randomly selected from that population. Construct a frequency distribution table.
  • 23. 19 18 30 40 41 33 73 25 23 25 21 33 65 17 20 76 47 69 20 31 18 24 35 24 17 36 65 70 22 25 65 16 24 29 42 37 26 46 27 63 21 27 23 25 71 37 75 25 27 23
  • 24. Using Table: • What is the lower class limit of the highest class? Upper class limit of the lowest class? • Find the class mark of the class 43 – 51. • What is the frequency of the class 16 – 24?
  • 25. Classes Class Tally Marks Freq. x boundaries 70 – 78 69.5 – 78.5 ///// 5 74 61 – 69 60.5 – 69.5 ///// 5 65 52 – 60 51.5 – 60.5 0 56 43 – 51 42.5 – 51.5 // 2 47 34 – 42 33.5 – 42.5 /////-// 7 38 25 – 33 24.5 – 33.5 /////-/////-//// 14 29 16 – 24 15.5 – 24.5 /////-/////-/////-// 17 20
  • 26. Example 1 The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded off to the nearest dollar, are listed on the next slide. 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 75 79 75 72 76 104 74 62 68 97 105 77 65 80 109 85 97 88 68 83 68 71 69 67 74 62 82 98 101 79 105 79 69 62 73
  • 27. CUMULATIVE FREQUENCY DISTRIBUTION • The less than cumulative frequency distribution (F<) is constructed by adding the frequencies from the lowest to the highest interval while the more than cumulative frequency distribution (F>) is constructed by adding the frequencies from the highest class interval to the lowest class interval.
  • 28. Tabular Summary Frequency Distribution of engine tune-ups Cumulative Frequency Cost ($) Frequency Relative Frequency less than more than 50-59 2 0.04 2 50 60-69 13 0.26 15 48 2 + 13 70-79 16 0.32 31 35 80-89 7 0.14 38 5 + 7 18 90-99 7 0.14 45 12 100-109 5 0.10 50 5 50 1.00 45 tune-ups 12 tune-ups cost less cost more than $ 100 than $ 89
  • 29. Graphical Summary: Histogram 18 16 14 Frequency 12 10 8 6 4 2 50-59 60-69 70-79 80-89 90-99 100-110 Cost ($) Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.
  • 30. Ogive less than ogive 50 40 Frequency 30 20 more than ogive 10 Tune-up 60 70 80 90 100 110 Cost ($) median
  • 31. Stem-and-Leaf Display 5 2 7 6 2 2 2 2 5 6 7 8 8 8 9 9 9 7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 8 0 0 2 3 5 8 9 9 1 3 7 7 7 8 9 10 1 4 5 5 9 a stem a leaf A single digit is used to define each leaf Leaf units may be 100, 10, 1, 0.1, and so on Where the leaf unit is not shown, it is assumed to equal 1 In the above example, the leaf unit was 1
  • 32. Leaf Unit = 0.1 8.6 11.7 9.4 9.1 10.2 11.0 8.8 8 6 8 9 1 4 10 2 11 0 7 Leaf Unit = 10 1806 1717 1974 1791 1682 1910 1838 16 8 17 1 9 The 82 in 1682 18 0 3 is rounded down 19 1 7 to 80 and is represented as an 8
  • 33. Measures of Central Tendency Arithmetic Mean, Weighted Mean, Geometric Mean, Median, Mode, Partition Values – Quartiles, Deciles and Percentiles Measures of Dispersion Range, Mean deviation, Standard deviation, Variance, Co-efficient of variation Measures of Position Quartile deviation
  • 34. • What is the “location” or “centre” of the data? (measures of location or central tendency) • How do the data vary? (measures of variability or dispersion) Mean: the average obtained by finding the sum of the numbers and dividing by the number of numbers in the sum. Median: When the numbers are listed from highest to lowest or lowest to highest, the median is the average number found in the middle. If there are an even number of data, find the average of the middle two numbers. Mode: The number that occurs the most often.
  • 35. Mean is the most widely used measure of location and shows the central value of the data. µ is thepopulation mean µ= ∑Xi N is the population size Xi is a particular population value N Σ indicates the operation of adding ΣX xi µ is thesample mean X = n is the sample size n xi is a particular sample value • all values are used • unique • sum of the deviations from the mean is 0 • affected by unusually large or small data values
  • 36. The Median is the midpoint of the values after they have been ordered from the smallest to the largest. For an even set of values, the median will be the arithmetic average of the two middle numbers and is found at the (n+1)/2 ranked observation. There are as many values above the median as below it in the data array.  unique  not affected by extremely large or small values ⇒ good measure of location when such values occur
  • 37. The Mode is another measure of location and represents the value of the observation that appears most frequently. Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.
  • 38. Weighted Mean of a set of numbers X , X , ..., X ,1 2 n with corresponding weights w1, w2, ...,wn ( w1 X 1 + w2 X 2 + ... + wn X n ) Xw = ( w1 + w2 + ...wn ) Geometric Mean of a set of n numbers is defined as the nth root of the product of the n numbers. GM = n ( X 1)( X 2 )( X 3)...( Xn ) GM is used to average percents, indexes, and relatives.
  • 39. Example 1 The interest rate on three bonds were 5, 21, and 4 percent. The arithmetic mean is (5+21+4) / 3 =10.0 The geometric mean is GM = 3 (5)(21)(4) = 7.49 The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21%
  • 40. Example 2 Grow th in Sales 1999-2004 Another use of GM is to determine the 50 percent increase in Sales in Millions($) 40 sales, production 30 or other business 20 or economic series 10 from one time 0 period to another. 1999 2000 2001 2002 2003 2004 Year (Value at end of period) GM = n −1 (Value at beginning of period)
  • 41. Example 3 The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%. 835,000 GM = 8 −1 = .0127 755,000
  • 42. Measures of Dispersion •Range • Mean Deviation •Quartile Deviation •Standard Deviation •Variance •Co-efficient of Variation
  • 43. Dispersion 30 refers to the 25 spread or variability in 20 the data. 15 10 5 mean 0 0 2 4 6 8 10 12 Range = Largest value – Smallest value
  • 44. Range Example The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio. -8.1 3.2 5.9 8.1 12.3 -5.1 4.1 6.3 9.2 13.3 -3.1 4.6 7.9 9.5 14.0 -1.4 4.8 7.9 9.7 15.0 1.2 5.7 8.0 10.3 22.1 Highest value: 22.1 Lowest value: -8.1 Range = Highest value – lowest value = 22.1-(-8.1) = 30.2
  • 45. Mean Deviation The arithmetic mean of the absolute values of the deviations from the arithmetic mean.  All values are used M D = Σ X - X in the calculation. n  Itis not unduly influenced by large or small values.  The absolute values are difficult to manipulate.
  • 46. Example 5 The weights of a sample of crates containing books for the bookstore (in pounds ) are: 103, 97, 101, 106, 103 X = 102 ΣX −X 103 −102 + ... + 103 −102 MD = = n 5 1 + 5 +1 + 4 + 5 = = 2.4 5
  • 47. Standard deviation and Variance the arithmetic mean of Standard deviation = √(variance) the squared deviations from the mean σ 2 = Σ (X - µ)2 Population Variance N X is the value of an observation in the population μ is the arithmetic mean of the population N is the number of observations in the population Population Standard Deviation, σ
  • 48. Example 6 In Example 4, the variance and standard deviation are: σ 2 = Σ (X - µ)2 N ( - 8 .1 - 6 .6 2 ) 2 + ( - 5 .1 - 6 .6 2 ) 2 + ... + ( 2 2 .1 - 6 .6 2 ) 2 σ2= 25 σ2 = 4 2 .2 2 7 σ == 6 . 4 9 8 Σ(X - X ) 2 Sample variance s2 = n -1 Sample standard deviation, s
  • 49. Example 7 The hourly wages earned by a sample of five students are $7, $5, $11, $8, $6. ΣX 37 X = = = 7.40 n 5 Σ( X − X ) ( 7 − 7.4 ) +... + ( 6 − 7.4 ) 2 2 2 s 2 = = n −1 5 −1 21.2 = = 5.30 5 −1 s= s 2 = 5.30 = 2.30
  • 50. Example: Data: X = {6, 10, 5, 4, 9, 8}; N=6 Mean: X X−X (X − X ) 2 X= ∑X = 42 =7 6 -1 1 N 6 10 3 9 Variance: 5 -2 4 s = 2 ∑ ( X − X )2 = 28 = 4.67 4 -3 9 N 6 9 2 4 Standard Deviation: 8 1 1 s = s 2 = 4.67 = 2.16 Total: 42 Total: 28
  • 51. Empirical Rule: For any symmetrical, bell-shaped distribution About 68% of the observations will lie within 1s the mean About 95% of the observations will lie within 2s of the mean Nearly all the observations will be within 3s of the mean
  • 52. Interpretation and Uses of the Standard Deviation 68% 95% 99.7% µ− 3σ µ−2σ µ−1σ µ µ+1σ µ+2σ µ+ 3 σ
  • 53. Quartiles Q1, Q2, Q3 divides ranked data into four equal parts 25% 25% 25% 25% Q1 Q2 Q3 Fra cti 10 Deciles: D , D , D , D , D , D , D , D , D 1 2 3 4 5 divides ranked data into ten equal parts 6 les 7 8 9 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% D1 D2 D3 D4 D5 D6 D7 D8 D9 99 Percentiles: divides ranked data into 100 equal parts
  • 54. Relative Standing Percentiles percentile of value x = ((number of values < x)/ total number of values)*100 (round the result to the nearest whole number Suppose that in a class of 25 people we have the following averages (ordered in ascending order) 42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98 If you received a 77, what percentile are you? percentile of 77 = (12/25)*100 = 48
  • 55. Relative Standing Quartiles Instead of finding the percentile of a single data value as we did on the previous page, it is often useful to group the data into 4, or more, (nearly) equal groups. When grouping the data into four equal groupings, we call these groupings quartiles. Let n = number of items in the data set k = percent desired (ex. k= 25) L = locator  the value separating the first k percent of the data from the rest L = (k/100) * n
  • 56. Relative Standing Let’s separate the 25 class grades into four quartiles. •Step 1 – order the data in ascending order 42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, L 91, 94, 98 25 Q1 Q2 Q3 Now find the 3 locators L25, L50, L75, Round fraction part up L25 = (25/100) * 25 = 6.25 7 to the next integer L50 = (50/100) * 25 = 12.5 13 L75 = (75/100) * 25 = 18.75 19
  • 57. Relative Standing Other measures of relative standing include •Interquartile range (IQR) = Q3 - Q1 •Semi-interquartile range = (Q3 - Q1)/ 2 •Midquartile = (Q3 + Q1)/2 •10 – 90 percentile range = P90 - P10 For the data on the previous page we have: IQR = 84 – 70 = 16 Measures of variation Semi IQR = (84 – 70)/2 = 8 Midquartile = (84 + 70)/2 = 77 Measure of central tendency
  • 58. Box Diagram 65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72, 73, 73, 73, L25 74, 74, 75, 75, 75, 75, 76, 76, 77, 77, 77, 77, 77, 77, 78, media 78, 78, 78, 79, 79, 79, 79, 80, 81, 81, 81, 81, 81, 81, 81, n L75 81, 82, 82, 83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92 To construct a box diagram to illustrate the extent to which the extreme data values lie beyond the interquartile range, draw a line with the low and high value highlighted at the two ends. Mark the gradations between these two extremes, then locate the quartile boundaries Q1, Med., and Q3 on this line. Construct a box about Q1 = (73 + 74)/2 = 73.5 these values. Q1 M Q3 65 69 73 77 81 85 92 89
  • 59. number of scores less than a Percentile of score a = * 100 total number of scores Relation between the different fractiles D1 = P10 D2 = P20 • Q1 = P25 D3 = P30 • Q2 = P50 • • • Q3 = P75 • D9 = P90 Interquartile Range: Q 3 – Q1
  • 60. Box plot graphical display, based on quartiles, that helps to picture a set of data. Five pieces of data are needed to construct a box plot: Minimum Value, First Quartile, Q1 The box represents the interquartile Median, range which contains the 50% of Third Quartile, Q3 values. Maximum Value. The whiskers represent the range; they extend from the box to the highest and lowest values, excluding outliers. A line across the box indicates the median.
  • 61. Example 8 Based on a sample of 20 deliveries, Buddy’s Pizza determined the following information. The minimum delivery time was 13 minutes and the maximum 30 minutes. The first quartile was 15 minutes, the median 18 minutes, and the third quartile 22 minutes. Develop a box plot for the delivery times. M in Q M e d ia n Q M ax 1 3 1.5 times the IQ range 1.5 times the interquartile range 12 14 16 18 20 22 24 26 28 30 32
  • 62. Skewness measurement of the lack of symmetry of the distribution. Symmetric distribution: A distribution having the same shape on either side of the centre Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution. Can be positively or negatively skewed, or bimodal
  • 63.
  • 64. Relative Positions of the Mean, Median, and Mode in a Symmetric Distribution M e a n M e d ia n M o d e
  • 65. Relative Positions of the Mean, Median, and Mode in a Right Skewed or Positively Skewed Distribution Mean > Median > Mode M o d e M e a n M e d ia n
  • 66. The Relative Positions of the Mean, Median, and Mode in a Left Skewed or Negatively Skewed Distribution Mean < Median < Mode M e a n M o d e M e d ia n
  • 67. The coefficient of skewness can range from -3.00 up to 3.00 A value of 0 indicates a symmetric distribution. Example 9 Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, 84.5. 3 ( X - Median ) sk = = -.035 s
  • 68. Kurtosis • derived from the Greek word κυρτός, kyrtos or kurtos, meaning bulging • measure of the "peakedness" of the probability distribution of a real-valued random variable • higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations.
  • 69. distribution with positive kurtosis is called leptokurtic, or leptokurtotic. In terms of shape, a leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values). distribution with negative kurtosis is called platykurtic, or platykurtotic. In terms of shape, a platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values).
  • 70. Other distribution – Leptokurtic Normal distribution - Mesokurtic Normal distribution - Mesokurtic Other distribution – Platykurtic
  • 71. Comparing Standard Deviations Data A Mean = 15.5 11 12 13 14 15 16 17 18 19 20 21 s = 3.338 Data B Mean = 11 12 13 14 15 16 17 18 19 20 21 15.5 s = .9258 Data C Mean = 15.5 11 12 13 14 15 16 17 18 19 20 21 s = 4.57
  • 72. Co-efficient of variation • Measures relative variation S  CV =  ÷100% • Always in percentage (%) X  • Shows variation relative to mean • Is used to compare two or more sets of data measured in different units When the mean value is near zero, the coefficient of variation is sensitive to change in the standard deviation, limiting its usefulness.
  • 73. Stock A: Average price last year = $50 Standard deviation = $5 S   $5  CV =  ÷100% =  ÷100% = 10% X   $50  Stock B: Average price last year = $100 Standard deviation = $5 S   $5  CV =  ÷100% =  ÷100% = 5% X   $100 

Notas do Editor

  1. 1.5 tiinterquartile rangemes the