Seu SlideShare está sendo baixado. ×

Carregando em…3
×

1 de 9
1 de 9

# Statistics

### Semelhante a Statistics

Gratuito durante 30 dias do Scribd

Ver tudo

Gratuito durante 30 dias do Scribd

Ver tudo

### Statistics

1. 1. 2013/05/22 1 STATISTICS X-Kit Textbook Chapter 9 Precalculus Textbook Appendix B: Concepts in Statistics Par B.2 CONTENT THE GOAL Look at ways of summarising a large amount of sample data in just one or two key numbers. Two important aspects of a set of data: •The LOCATION •The SPREAD MEASURES OF CENTRAL TENDENCY (LOCATION) Arithmetic Mean (Average) Mode (the highest point/frequency) Median (the middle observation) Number of fraudulent cheques received at a bank each week for 30 weeks Week 1 2 3 4 5 6 7 8 9 10 5 3 8 3 3 1 10 4 6 8 Week 11 12 13 14 15 16 17 18 19 20 3 5 4 7 6 6 9 3 4 5 Week 21 22 23 24 25 26 27 28 29 30 7 9 4 5 8 6 4 4 10 4 ARITHMETIC MEAN • 𝒙 = 𝟏𝟔𝟒 𝟑𝟎 = 𝟓. 𝟒𝟕 • To calculate the MEAN add all the data points in our sample and divide by die number of data points (sample size). • The MEAN can be a value that doesn’t actually match any observation. • The MEAN gives us useful information about the location of our frequency distribution.
2. 2. 2013/05/22 2 GRAPH 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 Frequency Frequency CALCULATE THE MEAN Raw Data • 𝑥 = 𝑥 𝑛 • 𝑥 is data points • 𝑛 is number of observations Frequency Table • 𝑥 = 𝑥𝑓 𝑛 • 𝑥 is data points • 𝑛 is number of observations • 𝑓 is the frequency Frequency Table (Intervals) • 𝑥 = 𝑥𝑓 𝑛 • 𝑥 is midpoints for intervals • 𝑛 is number of observations • 𝑓 is the frequency CALCULATE THE MEAN - FREQUENCY TABLE: NUBEROFFRAUDULENT CHEQUESPERWEEK Distinct Values TallyMarks Frequency 1 / 1 2 0 3 //// 5 4 //// // 7 5 //// 4 6 //// 4 7 // 2 8 /// 3 9 // 2 10 // 2 Truck Data: weights (in tonnes) of 20 fully loaded trucks Truck 1 2 3 4 5 6 7 8 9 10 Weight 4.54 3.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80 Truck 11 12 13 14 15 16 17 18 19 20 Weight 2.52 5.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53 CALCULATE THE MEAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks Class Intervals Frequency Midpoint 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 MODE •The mode is the interval with the HIGHEST FREQUENCY. •There can be two or more modes in a set of data – then the mode would not be a good measure of central tendency. •MULTI-MODAL data consist of more than one mode. •UNI-MODAL data consist of only one mode.
3. 3. 2013/05/22 3 GRAPH: The MODE = 4 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 Frequency Frequency Call Centre Data: waiting times (in seconds) for 35 randomly selected customers C1 2 3 4 5 6 7 8 9 10 11 12 75 37 13 90 45 23 104 135 30 73 34 12 C13 14 15 16 17 18 19 20 21 22 23 24 38 40 22 47 26 57 65 33 9 85 87 16 C25 26 27 28 29 30 31 32 33 34 35 102 115 68 29 142 5 15 10 25 41 49 FREQUENCY TABLE: The MODAL CLASS is the interval 𝟐𝟓 < 𝒙 ≤ 𝟓𝟎 Class Intervals TallyMarks Frequency 0 ≤ 𝑥 ≤ 25 //// //// 10 25 < 𝑥 ≤ 50 //// //// / 11 50 < 𝑥 ≤ 75 //// / 6 75 < 𝑥 ≤ 100 /// 3 100 < 𝑥 ≤ 125 /// 3 125 < 𝑥 ≤ 150 // 2 HISTOGRAM: MODAL CLASS (𝟐𝟓 < 𝒙 ≤ 𝟓𝟎] 0 2 4 6 8 10 12 Intervals [0;25] (25;50] (50;75] (75;100] (100;125] (125;150] THE MEDIAN – RAW DATA: Numberoffraudulentchequesreceived atabankeach weekfor30weeks Week 1 2 3 4 5 6 7 8 9 10 5 3 8 3 3 1 10 4 6 8 Week 11 12 13 14 15 16 17 18 19 20 3 5 4 7 6 6 9 3 4 5 Week 21 22 23 24 25 26 27 28 29 30 7 9 4 5 8 6 4 4 10 4 MEDIAN • Median = 5 • Put all observations in order from smallest to largest, then the middle observation is the MEDIAN. 1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10
4. 4. 2013/05/22 4 DON’T FALL INTO THE COMMON TRAP • The median is NOT the middle of the range of observations, for example 1, 1, 1, 1, 1, 3, 9 The median is 1 (the middle observation). The middle of the range (9 – 1) is 5! Big difference! MEDIAN Odd Number of Observations, for example 7 Median Position 𝒏+𝟏 𝟐 Even Number of Observations, for example30 Median Position half-way between 𝒏 𝟐 𝒂𝒏𝒅 ( 𝒏 𝟐 + 𝟏) FINDTHE MEDIAN -FREQUENCYTABLE: NUBER OF FRAUDULENT CHEQUES PERWEEK Distinct Values Frequency Cumulative Frequency 1 1 1 2 0 1 3 5 6 4 7 13 5 4 17 6 4 21 7 2 23 8 3 26 9 2 28 10 2 30 FIND THE MEDIAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks ClassIntervals Frequency Midpoint 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 FIND THE MEDIAN FROM A GROUPED FREQUENCY TABLE •Median (middle observation)? •Find the class interval in which that observation lies. ? CALCULATIONS Raw Data Mean Mode Median Frequency Table (Ungrouped Data) Mean Mode Median Frequency Table (Grouped Data) Mean Mode Median
5. 5. 2013/05/22 5 HOW TO CHOOSE THE BEST MEASURE OF LOCATION? • When choosing the best measure of location, we need to look as the SHAPE of the distribution. • For nearly symmetric data, the mean is the best choice. • For very skewed (asymmetric) data, the mode or median is better. • The mean moves further along the tail than the median, it is more sensitive to the values far from the centre. SYMMETRIC histogram: Mean = Median = Mode A POSITIVELY SKEWED (skewed to the right) histogram has a longer tail on the right side: Mode < Median < Mean A NEGATIVELY SKEWED (skewed to the left) histogram has a longer tail on the left side: Mean < Median < Mode PROBLEM •We can find two very different data sets (one distribution very spread out and another very concentrated) with measures of central tendency EQUAL. •To find a true idea of our sample, we have to MEASURE THE SPREAD OF A DISTRIBUTION, called the spread dispersion. MEASURESOF SPREAD(DISPERSION) Interquartile Range Variance Standard Deviation
6. 6. 2013/05/22 6 MEASURINGSPREAD •Think of a distribution in terms of percentages, a horizontal axis equally divided into 100 percentiles. •The 10th percentile marks the point below which 10% of the observations fall, and above which 90% of observations fall. •The 50th percentile, below which 50% of the observations lie, is the median. WORKINGWITH A PERCENTILE • 𝑝% of the observationfall belowthe 𝑝 𝑡ℎ percentile. 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝒑 𝟏𝟎𝟎 𝒏 + 𝟏 • Workingwith the example on fraudulentcheques: 1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10 𝑷 𝟓𝟎 = 𝟓𝟎 𝟏𝟎𝟎 𝟑𝟎 + 𝟏 = 𝟏𝟓. 𝟓 • 15.5 tells us where to find our 50th percentile. • 15 tells us which observation to go to, and 0.5 tells us how far to move along the space between that observation and the next highest one. FORMULA • 𝑷 𝟓𝟎 = 𝒙 𝟏𝟓 + 𝟎. 𝟓 𝒙 𝟏𝟔 − 𝒙 𝟏𝟓 𝑷 𝒑 = 𝒙 𝒌 + 𝒅 𝒙 𝒌+𝟏 − 𝒙 𝒌 • 𝑃 means percentile • 𝑝 tell us which percentile • 𝑘 the whole number calculated from the position • 𝑑 the decimal fraction calculated from the position WORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA: NUBEROFFRAUDULENT CHEQUESPERWEEK Distinct Values Frequency Cumulative Frequency 1 1 1 2 0 0 + 1 = 1 3 5 1 + 5 = 6 4 7 6 + 7 = 13 5 4 13 + 4 = 17 6 4 17 + 4 = 21 7 2 21 + 2 = 23 8 3 23 + 3 = 26 9 2 26 + 2 = 28 10 2 28 + 2 = 30 WORKING WITH PERCENTILES (AND MEDIAN) FROM GROUPED DATA • To identify the class interval 𝑳 < 𝒙 ≤ 𝑼 containing the 𝑝 𝑡ℎ percentile: 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 = 𝒑 𝟏𝟎𝟎 𝒏 + 𝟏 • The decimal fraction for grouped data is: 𝒅 = 𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏−𝑺𝒖𝒎 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒕𝒐 𝑳 𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑳 < 𝒙 ≤ 𝑼 • Calculate the 𝑝 𝑡ℎ percentile: 𝑷 𝒑 ≈ 𝑳 + 𝒅 𝑼 − 𝑳 FIND THE MEDIAN - GROUPED FREQUENCY TABLE: TruckData: weights(intonnes)of20fullyloadedtrucks Class Intervals Frequency CumulativeFrequency 𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4 𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5 𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10 𝟒. 𝟎 < 𝐱 ≤ 𝟒. 𝟓 3 13 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16 𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19 𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20