O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Mais Conteúdo rRelacionado

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Statistics

  1. 1. 2013/05/22 1 STATISTICS X-Kit Textbook Chapter 9 Precalculus Textbook Appendix B: Concepts in Statistics Par B.2 CONTENT THE GOAL Look at ways of summarising a large amount of sample data in just one or two key numbers. Two important aspects of a set of data: •The LOCATION •The SPREAD MEASURES OF CENTRAL TENDENCY (LOCATION) Arithmetic Mean (Average) Mode (the highest point/frequency) Median (the middle observation) Number of fraudulent cheques received at a bank each week for 30 weeks Week 1 2 3 4 5 6 7 8 9 10 5 3 8 3 3 1 10 4 6 8 Week 11 12 13 14 15 16 17 18 19 20 3 5 4 7 6 6 9 3 4 5 Week 21 22 23 24 25 26 27 28 29 30 7 9 4 5 8 6 4 4 10 4 ARITHMETIC MEAN • 𝒙 = 𝟏𝟔𝟒 𝟑𝟎 = 𝟓. 𝟒𝟕 • To calculate the MEAN add all the data points in our sample and divide by die number of data points (sample size). • The MEAN can be a value that doesn’t actually match any observation. • The MEAN gives us useful information about the location of our frequency distribution.
  2. 2. 2013/05/22 2 GRAPH 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 Frequency Frequency CALCULATE THE MEAN Raw Data • 𝑥 = 𝑥 𝑛 • 𝑥 is data points • 𝑛 is number of observations Frequency Table • 𝑥 = 𝑥𝑓 𝑛 • 𝑥 is data points • 𝑛 is number of observations • 𝑓 is the frequency Frequency Table (Intervals) • 𝑥 = 𝑥𝑓 𝑛 • 𝑥 is midpoints for intervals • 𝑛 is number of observations • 𝑓 is the frequency CALCULATE THE MEAN - FREQUENCY TABLE: NUBEROFFRAUDULENT CHEQUESPERWEEK Distinct Values TallyMarks Frequency 1 / 1 2 0 3 //// 5 4 //// // 7 5 //// 4 6 //// 4 7 // 2 8 /// 3 9 // 2 10 // 2 Truck Data: weights (in tonnes) of 20 fully loaded trucks Truck 1 2 3 4 5 6 7 8 9 10 Weight 4.54 3.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80 Truck 11 12 13 14 15 16 17 18 19 20 Weight 2.52 5.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53 CALCULATE THE MEAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks Class Intervals Frequency Midpoint 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 MODE •The mode is the interval with the HIGHEST FREQUENCY. •There can be two or more modes in a set of data – then the mode would not be a good measure of central tendency. •MULTI-MODAL data consist of more than one mode. •UNI-MODAL data consist of only one mode.
  3. 3. 2013/05/22 3 GRAPH: The MODE = 4 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 Frequency Frequency Call Centre Data: waiting times (in seconds) for 35 randomly selected customers C1 2 3 4 5 6 7 8 9 10 11 12 75 37 13 90 45 23 104 135 30 73 34 12 C13 14 15 16 17 18 19 20 21 22 23 24 38 40 22 47 26 57 65 33 9 85 87 16 C25 26 27 28 29 30 31 32 33 34 35 102 115 68 29 142 5 15 10 25 41 49 FREQUENCY TABLE: The MODAL CLASS is the interval 𝟐𝟓 < 𝒙 ≤ 𝟓𝟎 Class Intervals TallyMarks Frequency 0 ≤ 𝑥 ≤ 25 //// //// 10 25 < 𝑥 ≤ 50 //// //// / 11 50 < 𝑥 ≤ 75 //// / 6 75 < 𝑥 ≤ 100 /// 3 100 < 𝑥 ≤ 125 /// 3 125 < 𝑥 ≤ 150 // 2 HISTOGRAM: MODAL CLASS (𝟐𝟓 < 𝒙 ≤ 𝟓𝟎] 0 2 4 6 8 10 12 Intervals [0;25] (25;50] (50;75] (75;100] (100;125] (125;150] THE MEDIAN – RAW DATA: Numberoffraudulentchequesreceived atabankeach weekfor30weeks Week 1 2 3 4 5 6 7 8 9 10 5 3 8 3 3 1 10 4 6 8 Week 11 12 13 14 15 16 17 18 19 20 3 5 4 7 6 6 9 3 4 5 Week 21 22 23 24 25 26 27 28 29 30 7 9 4 5 8 6 4 4 10 4 MEDIAN • Median = 5 • Put all observations in order from smallest to largest, then the middle observation is the MEDIAN. 1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10
  4. 4. 2013/05/22 4 DON’T FALL INTO THE COMMON TRAP • The median is NOT the middle of the range of observations, for example 1, 1, 1, 1, 1, 3, 9 The median is 1 (the middle observation). The middle of the range (9 – 1) is 5! Big difference! MEDIAN Odd Number of Observations, for example 7 Median Position 𝒏+𝟏 𝟐 Even Number of Observations, for example30 Median Position half-way between 𝒏 𝟐 𝒂𝒏𝒅 ( 𝒏 𝟐 + 𝟏) FINDTHE MEDIAN -FREQUENCYTABLE: NUBER OF FRAUDULENT CHEQUES PERWEEK Distinct Values Frequency Cumulative Frequency 1 1 1 2 0 1 3 5 6 4 7 13 5 4 17 6 4 21 7 2 23 8 3 26 9 2 28 10 2 30 FIND THE MEDIAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks ClassIntervals Frequency Midpoint 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 FIND THE MEDIAN FROM A GROUPED FREQUENCY TABLE •Median (middle observation)? •Find the class interval in which that observation lies. ? CALCULATIONS Raw Data Mean Mode Median Frequency Table (Ungrouped Data) Mean Mode Median Frequency Table (Grouped Data) Mean Mode Median
  5. 5. 2013/05/22 5 HOW TO CHOOSE THE BEST MEASURE OF LOCATION? • When choosing the best measure of location, we need to look as the SHAPE of the distribution. • For nearly symmetric data, the mean is the best choice. • For very skewed (asymmetric) data, the mode or median is better. • The mean moves further along the tail than the median, it is more sensitive to the values far from the centre. SYMMETRIC histogram: Mean = Median = Mode A POSITIVELY SKEWED (skewed to the right) histogram has a longer tail on the right side: Mode < Median < Mean A NEGATIVELY SKEWED (skewed to the left) histogram has a longer tail on the left side: Mean < Median < Mode PROBLEM •We can find two very different data sets (one distribution very spread out and another very concentrated) with measures of central tendency EQUAL. •To find a true idea of our sample, we have to MEASURE THE SPREAD OF A DISTRIBUTION, called the spread dispersion. MEASURESOF SPREAD(DISPERSION) Interquartile Range Variance Standard Deviation
  6. 6. 2013/05/22 6 MEASURINGSPREAD •Think of a distribution in terms of percentages, a horizontal axis equally divided into 100 percentiles. •The 10th percentile marks the point below which 10% of the observations fall, and above which 90% of observations fall. •The 50th percentile, below which 50% of the observations lie, is the median. WORKINGWITH A PERCENTILE • 𝑝% of the observationfall belowthe 𝑝 𝑡ℎ percentile. 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝒑 𝟏𝟎𝟎 𝒏 + 𝟏 • Workingwith the example on fraudulentcheques: 1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10 𝑷 𝟓𝟎 = 𝟓𝟎 𝟏𝟎𝟎 𝟑𝟎 + 𝟏 = 𝟏𝟓. 𝟓 • 15.5 tells us where to find our 50th percentile. • 15 tells us which observation to go to, and 0.5 tells us how far to move along the space between that observation and the next highest one. FORMULA • 𝑷 𝟓𝟎 = 𝒙 𝟏𝟓 + 𝟎. 𝟓 𝒙 𝟏𝟔 − 𝒙 𝟏𝟓 𝑷 𝒑 = 𝒙 𝒌 + 𝒅 𝒙 𝒌+𝟏 − 𝒙 𝒌 • 𝑃 means percentile • 𝑝 tell us which percentile • 𝑘 the whole number calculated from the position • 𝑑 the decimal fraction calculated from the position WORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA: NUBEROFFRAUDULENT CHEQUESPERWEEK Distinct Values Frequency Cumulative Frequency 1 1 1 2 0 0 + 1 = 1 3 5 1 + 5 = 6 4 7 6 + 7 = 13 5 4 13 + 4 = 17 6 4 17 + 4 = 21 7 2 21 + 2 = 23 8 3 23 + 3 = 26 9 2 26 + 2 = 28 10 2 28 + 2 = 30 WORKING WITH PERCENTILES (AND MEDIAN) FROM GROUPED DATA • To identify the class interval 𝑳 < 𝒙 ≤ 𝑼 containing the 𝑝 𝑡ℎ percentile: 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝒑 𝟏𝟎𝟎 𝒏 + 𝟏 • The decimal fraction for grouped data is: 𝒅 = 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏−𝑺𝒖𝒎 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒕𝒐 𝑳 𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑳 < 𝒙 ≤ 𝑼 • Calculate the 𝑝 𝑡ℎ percentile: 𝑷 𝒑 ≈ 𝑳 + 𝒅 𝑼 − 𝑳 FIND THE MEDIAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks Class Intervals Frequency CumulativeFrequency 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10 𝟒. 𝟎 < 𝐱 ≤ 𝟒. 𝟓 3 13 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20
  7. 7. 2013/05/22 7 FIND THEMEDIAN-GROUPEDFREQUENCYTABLE: TruckData: weights(intonnes)of20fullyloadedtrucks • To identify the class interval 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 containing the 50 𝑡ℎ percentile: 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝟓𝟎 𝟏𝟎𝟎 𝟐𝟎 + 𝟏 = 𝟏𝟎. 𝟓 • The decimal fraction for grouped data is: 𝒅 = 𝟏𝟎.𝟓 − 𝟏𝟎 𝟑 = 𝟏 𝟔 • Calculate the 𝑝 𝑡ℎ percentile: 𝑷 𝟓𝟎 ≈ 𝟒. 𝟎 + 𝒅 𝟒. 𝟓 − 𝟒. 𝟎 = 𝟒. 𝟎𝟖𝟑𝟑𝟑 MEASURINGSPREAD • If we measure the DIFFERENCE in value between one percentile and another, this would give us an idea of how widely our data is spread out. • INTERQUARTILE RANGE (IQR) = 75th – 25th Percentiles • The bigger the IQR, the more spread out the data. • The 75th percentile ≥ 25th percentile, therefor the IQR ≥ 0 . • We tend to use the MEDIAN (as measure of central tendency) together with the IQR. FIND THE IQR - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks ClassIntervals Frequency CumulativeFrequency 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 13 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20 FIND THEMEDIAN-GROUPEDFREQUENCYTABLE: TruckData: weights(intonnes)of20fullyloadedtrucks • To identify the class interval 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 containing the 75 𝑡ℎ percentile: 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝟕𝟓 𝟏𝟎𝟎 𝟐𝟎 + 𝟏 = 𝟏𝟓. 𝟕𝟓 • The decimal fraction for grouped data is: 𝒅 = 𝟏𝟓. 𝟕𝟓 − 𝟏𝟑 𝟑 = 𝟎. 𝟗𝟏𝟕 • Calculate the 𝑝 𝑡ℎ percentile: 𝑷 𝟕𝟓 ≈ 𝟒. 𝟓 + 𝒅 𝟓. 𝟎 − 𝟒. 𝟓 = 𝟒. 𝟗𝟓𝟖 FIND THEMEDIAN-GROUPEDFREQUENCYTABLE: TruckData: weights(intonnes)of20fullyloadedtrucks • To identify the class interval 𝟑. 𝟓 < 𝒙 ≤ 𝟒.0 containing the 25 𝑡ℎ percentile: 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝟐𝟓 𝟏𝟎𝟎 𝟐𝟎 + 𝟏 = 𝟓. 𝟐𝟓 • The decimal fraction for grouped data is: 𝒅 = 𝟓. 𝟐𝟓 − 𝟓 𝟓 = 𝟎. 𝟎𝟓 • Calculate the 𝑝 𝑡ℎ percentile: 𝑷 𝟐𝟓 ≈ 𝟑. 𝟓 + 𝒅 𝟒. 𝟎 − 𝟑. 𝟓 = 𝟑. 𝟓𝟐𝟓 • IQR = 4.958 – 3.525 = 1.433 MEASURINGSPREAD • When we use the MEAN as our measure of central tendency, we usually choose A MEASURE OF HOW FAR THE DATA IS SPREAD OUT AROUND THE MEAN. • Two measures of spread that are based on the mean are the VARIANCE and the STANDARD DEVIATION. • An advantage of standard deviation is that it is measured in the same units as the original observations. • The variance and standard deviation are closely related. • The variance (𝒔 𝟐 or 𝝈 𝟐) is the square of the standard deviation (𝒔 or 𝝈).
  8. 8. 2013/05/22 8 VARIANCE& STANDARD DEVIATION • Variance is the rough average of all the squared distances from the mean: 𝒔 𝟐 = 𝒙 − 𝒙 𝟐 𝒏 − 𝟏 Or 𝒔 𝟐 = 𝟏 𝒏 − 𝟏 𝒙 𝟐 − 𝒙 𝟐 𝒏 • Variance is always a positive number. Number of fraudulent cheques received at a bank each week for 30 weeks Week 1 2 3 4 5 6 7 8 9 10 5 3 8 3 3 1 10 4 6 8 Week 11 12 13 14 15 16 17 18 19 20 3 5 4 7 6 6 9 3 4 5 Week 21 22 23 24 25 26 27 28 29 30 7 9 4 5 8 6 4 4 10 4 VARIANCE &STANDARD DEVIATIONFROMRAWDATA 𝒙 = 𝟓. 𝟒𝟕 Distinct Values 𝒙 − 𝒙 𝒙 − 𝒙 𝟐 Frequencies 𝒇 𝒙 − 𝒙 𝟐 1 1 − 5.47 = −4.47 −4.47 2 = 19.9809 𝟏𝟗. 𝟗𝟖𝟎𝟗 2 −3.47 12.0409 𝟎 3 −2.47 6.1009 𝟑𝟎. 𝟓𝟎𝟒𝟓 4 −1.47 2.1609 𝟏𝟓. 𝟏𝟐𝟔𝟑 5 0.47 0.2209 𝟎. 𝟖𝟖𝟑𝟔 6 0.53 0.2809 𝟏. 𝟏𝟐𝟑𝟔 7 1.53 2.3409 𝟒. 𝟔𝟖𝟏𝟖 8 2.53 6.4009 𝟏𝟗. 𝟐𝟎𝟐𝟕 9 3.53 12.4609 𝟐𝟒. 𝟗𝟐𝟏𝟖 10 4.53 20.5209 𝟒𝟏. 𝟎𝟒𝟏𝟖 (𝒙 − 𝒙 ) = 0 𝒙 − 𝒙 𝟐 = 82.509 𝟏𝟓𝟕. 𝟒𝟔𝟕 CALCULATE THE VARIANCE &STANDARD DEVIATION - FREQUENCY TABLE: NUBEROFFRAUDULENT CHEQUESPERWEEK Distinct Values Frequency Squared Observation 1 1 1 2 0 4 3 5 9 4 7 16 5 4 25 6 4 36 7 2 49 8 3 64 9 2 81 10 2 100 VARIANCE & STANDARD DEVIATION FROM UNGROUPED FREQUENCY DATA 𝒔 𝟐 = 𝟏 𝒏 − 𝟏 𝒇𝒙 𝟐 − 𝒇𝒙 𝟐 𝒏 • Variance: 𝒔 𝟐 = 𝟏 𝟑𝟎 − 𝟏 𝟏𝟎𝟓𝟒 − 𝟏𝟔𝟒 𝟐 𝟑𝟎 = 𝟓. 𝟒𝟐𝟗𝟗 • Standard deviation: 𝑠 = 𝜎 = 5.4299 = 𝟐. 𝟑𝟑 FIND THE VARIANCE & STANDARD DEVIATION - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks Class Intervals Frequency Midpoint Squared Midpoint 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 2.75 7.5625 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 10.5625 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 14.0625 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 18.0625 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 22.5625 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 27.5625 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 33.0625
  9. 9. 2013/05/22 9 VARIANCE & STANDARD DEVIATION FROM GROUPED DATA 𝒔 𝟐 = 𝟏 𝒏 − 𝟏 𝒇𝒙 𝟐 − 𝒇𝒙 𝟐 𝒏 • Variance: 𝒔 𝟐 = 𝟏 𝟐𝟎 − 𝟏 𝟑𝟒𝟖. 𝟕𝟓 − 𝟖𝟏. 𝟓 𝟐 𝟐𝟎 = 𝟎. 𝟖𝟕𝟓𝟕 • Standard deviation: 𝑠 = 𝜎 = 0.8757 = 𝟎. 𝟗𝟒 CALCULATIONS Raw Data IQR Variance & Standard Deviation Frequency Table (Ungrouped Data) IQR Variance & Standard Deviation Frequency Table (Grouped Data) IQR Variance & Standard Deviation BOX - AND - WISKER DIAGRAM (5 POINT SUMMARY) Minimum Value 𝑸 𝟏 = 𝑷 𝟐𝟓 Median𝑸 𝟑 = 𝑷 𝟕𝟓 Maximum Value EXAMPLE Consider the following set of 23 scores: 0 3 4 8 9 12 14 15 16 16 16 18 19 21 22 25 32 34 39 43 54 67 77 1. Find the 5 point summary 2. Draw a box – and – wisher plot to illustrate the values 5 - POINT SUMMARY 0 3 4 8 9 12 14 15 16 16 16 18 19 21 22 25 32 34 39 43 54 67 77 HOMEWORK •Example X-Kit textbook page 218 – 223. •“Practise for your exams” page 224 number 1 & 2. •Par B.2 (page B5) all odd number questions.

×