Statistics lecture 4 (ch3)

NEXT LECTURE
• Please bring scientific calculator and
calculator manual

2

NUBE Test
• You will not be asked to draw graphs – simply interpret
them
• The following topics will be included in test 1:-
– Discounts, percentages and commissions
– Multiple choice questions
– Graphs
– Mean, median, mode and standard deviations
– Dispersion
– Box and whisker plot
– Probabilities
– Probability distributions
– Sampling distributions
3

NUBE Test
• REMEMBER - YOU WILL NOT BE GIVEN ALL OF
THE FORMULAE IN THE TEST, YOU MUST
REMEMBER THE ONES THAT ARE NOT IN THE
PRINT OUT GIVEN TO YOU:-
• Attached please find the NUBE6112 FORMULAE AND
TABLES which students will need for all tests and
exams. Please can you print these back to back and
have them laminated because these are to be used year
on year. It has also been confirmed by the IIE that any
formulas that aren’t in the sheets, students are expected
to know from their lecturers, so could you please pass
this info on to your lecturers. The IIE were only given
permission to print what is on the sheet, which is also at
the back of the textbook. 4

• Properties to describe numerical data:
– Central tendency
– Dispersion
– Shape
• Measures calculated for:
– Sample data
• Statistics
– Entire population
• Parameters

5

Measures of location
• Arithmetic mean
• Median
• Mode

6

UNGROUPED or raw data refers to data as
they were collected, that is, before they are
summarised or organised in any way or form

GROUPED data refers to data summarised in
a frequency table

7

ARITHMETIC MEAN
- This is the most commonly used measure
and is also called the mean.

sum of sample observations
Sample mean =
number of sample observations
n

∑x i
x= i =1

n Sample size
8

ARITHMETIC MEAN
- This is the most commonly used measure
and is also called the mean.

sum of observations
Population mean =
number of observations
N

Mean ∑xi Xi = observations of the population

µ= i =1 ∑ = “the sum of”

N Population size
9

• MEDIAN
– Half the values in data set is smaller than median.
– Half the values in data set is larger than median.
– Order the data from small to large.
• Position of median
– If n is odd:
• The median is the (n+1)/2 th observation.
– If n is even:
• Calculate (n+1)/2
• The median is the average of the values before and
after (n+1)/2.
10

• MODE
– Is the observation in the data set that occurs the
most frequently.
– Order the data from small to large.
– If no observation repeats there is no mode.
– If one observation occurs more frequently:
• Unimodal
– If two or more observation occur the same
number of times:
• Multimodal
– Used for nominal scaled variables. 11

Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
9

∑x i
x= i =1

n
x1 + x2 + x3 + x4 + x5 + x6 + x6 + x58 + x−4
2
2 5
5 8
8 −3 5
−3 5 2
2 67 5 −4
= 9

n
9
9
26
= = 2,89
9 12

2 5 8 −3 5 2 6 5 −4
The median of the sample of nine measurements is given by:
Odd number

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

(n+1)/2 = (9+1)/2 = 5th measurement

Median = 5
13

2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number

−4 −3 2 2 3 5 5 5 6 8
1 2 3 4 5 6 7 8 9 10

(n+1)/2 = (10+1)/2 = 5,5th measurement

Median = (3+5)/2 = 4
14

2 5 8 −3 5 2 6 5 −4
Determine the mode of the sample of nine measurements.
•Order the measurements

−4 −3 2 2 5 5 5 6 8
Mode = 5
•Unimodal

15

2 5 8 −3 5 2 6 5 −4 2
Determine the mode of the sample of ten measurements.

−4 −3 2 2 2 5 5 5 6 8
Mode = 2 and 5
•Multimodal

16

Concept questions 1 - 12 p 64 –
Elementary Statistics for Business &
Economics

17

• ARITHMETIC MEAN
– Data is given in a frequency table
– Only an approximate value of the mean

x=
∑f x i i

∑f i

where f i = frequency of the i th class interval
xi = class midpoint of the i th class interval

18

• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the
median class interval.
– Median can also be determined from the ogive.
( ui − li ) ( n − Fi −1 )
2
M e = li +
fi
where li = lower boundary of the median interval
ui = upper boundary of the median interval
Fi -1 = cumulative frequency of interval foregoing
median interval
fi = frequency of the median interval
19

• MODE
– Class interval that has the largest frequency
value will contain the mode.
– Mode is the class midpoint of this class.
– Mode must be determined from the histogram.

20

Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
mean for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
determine the class [5–under 8) 4 6,5
midpoints [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 21
n = 48


x=
∑ fi xi Number of
calls
Number of
hours fi xi
∑ fi
[2–under 5) 3 3,5
597
= [5–under 8) 4 6,5
48 [8–under 11) 11 9,5
= 12, 44 [11–under 14) 13 12,5
Average number [14–under 17) 9 15,5
of calls per hour [17–under 20) 6 18,5
is 12,44. [20–under 23) 2 21,5 22
n = 48

median for the calls hours fi F
sample of the 48: [2–under 5) 3 3
hours: [5–under 8) 4 7
determine the [8–under 11) 11 18
cumulative [11–under 14) 13 31
frequencies [14–under 17) 9 40
n/2 = 48/2 = 24
The first cumulative [17–under 20) 6 46
frequency ≥ 24 [20–under 23) 2 48 23
n = 48

Median Number of Number of
( u −l ) ( n − F ) calls hours fi F
i i 2 i −1
= li +
fi [2–under 5) 3 3
= 11 +
( 14 − 11) ( 24 − 18 ) [5–under 8) 4 7
13 [8–under 11) 11 18
= 12,38
[11–under 14) 13 31
50% of the time less [14–under 17) 9 40
than 12,38 or 50% of [17–under 20) 6 46
the time more than
12,38 calls per hour. [20–under 23) 2 48 24
n = 48

Number of calls at a call centre The median can
be determined
48
40
form the ogive.
Number of hours

32
24 n/2 = 48/2 = 24
16
8
0
Median = 12,4
2 5 8 11
A
14 17 20 23 Read at A.
Number of calls
25

mode for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
draw the histogram [5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 26
n = 48


Number of calls at a call centre Mode = 12,3
14
Read at A.
Number of hours

12
10
8
6
4
2
0

2 5 8 11
A of14
Number calls
17 20 23

27

Relationship between mean, median, and mode
• If a distribution is symmetrical:
– the mean, median and mode are the same
and lie at centre of distribution
• If a distribution is non-symmetrical: Mean
– skewed to the left or to the right Mode
Median
– three measures differ
A positively skewed distribution A negatively skewed distribution
(skewed to the right) (skewed to the left)

Mode Mean Mean Mode 28
Median Median

MEAN – very affected by outliers (values that are very
small or very large relative to the majority of the values in a
dataset). Therefore MEAN not best measure where outliers
exist
MEDIAN – not affected by outliers so better to use this than
mean when they exist. Disadvantage is that its calculation
does not include all the values in a dataset
MODE – not affected by outliers. Disadvantage is that it
only includes values with highest frequency in its
calculation. When distribution is skewed median may
provide a better description of data

29

Group Classwork
• Get into groups of 4
• Read p75 – 76 Module Manual –
Choosing between the mean, median &
mode
• Read p 67 – 69 - Elementary Statistics for
Business Economics – Relationship
between mean, median and mode &
When to use the mean, median & mode
• Complete Izimvo Exchange 1 p 83 Module
Manual 30

Concept questions 13 – 25 p69 –
Elementary Statistics for Business and
Economics

31

Measures of dispersion
• Range
• Variance
• Standard deviation
• Coefficient of variation

32

• Range
– The range of a set of measurements is the
difference between the largest and smallest
values in the data set.
– Its major advantage is the ease with which it
can be computed.
– Its major shortcoming is its failure to provide
information on the dispersion of the values
between the two end points.

33

• Variance and standard deviation
Determine how far the observations are from their mean.

Where:
– x = sample mean
– x = values of the sample
– n = sample size
34

Determine how far the observations are from their mean.
∑( x − µ)
2

Population variance = σ 2
=
N

∑( x − µ )
2

Population standard deviation = σ =
N
Where:
– μ = population mean
– x = values of the population
– N = population size
35

• Coefficient of variation
– Measures the standard deviation relative to the
mean.
– It is expressed as a percentage.
– Used to compare samples that are measured in
different units.
s
CV = ×100
x

36

Example - Given the following data sets:
1The means-3 the same but the dispersion of Dataset 8
st
: -4 are 2 2 5 5 5 6 1
much larger than the dispersion of Data set 2.
2nd : 0 1 2 3 3 4 5 5

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
9
23
x= ≈ 2,9
8 37

Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The range of the measurements is given by:
Largest value – smallest value
= 8 – (−4) =5−0
= 12 =5
38

1st: −4 −3 2 2 5 5 5 6 8
2nd: 0 1 2 3 3 4 5 5
The variance of the measurements is given by:

39

1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The standard deviation of the measurements is given by:

40

1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The coefficient of variation of the measurements is
given by:
s 4, 08
CV = 100% = 100 = 140, 69%
x 2,9
s 1,81
CV = 100% = 100 = 62, 41%
x 2,9
41

P75 Elementary Statistics for Business and Economics

By applying the value of the std dev in combination with the value of the mean, we are
able to define where the majority of the data values are clustered using
CHEBYCHEFF’s THEORUM

•At least 75% of the values in any sample will be within k= 2 std dev of the
sample mean

•At least 89% of the values in any sample will be within k=3 std dev of the mean

•At least 94% of the values in any sample will be within k=4 std dev of the
sample mean

NOTE: k= the number of std dev distances to either side of the mean

42

EXAMPLE

Assume a data set has a mean of 50 and a std dev of
5.

Then 75% of the values in the data set occur in the
interval:-

Mean + 2 std dev = 50 +/- 2(5)

=50 +/- 10
= from 40 to 60

43

Classwork/Homework
• Concept questions 26 -35 , p 76 –
Elementary Statistics for Business and
Economics

44


Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– n = sample size
45

∑ fx ( ∑ fx )
2
2
− 1

Population variance = σ 2
= N

N

∑ fx ( ∑ fx )
2
2
− 1

Population standard deviation = σ = N

N
Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– N = population size

46

Number of Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
47
n = 48

Number of Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 48
n = 48

Concept Questions 36 – 41, p 80 –
Economics

49

• Quartiles
• Percentiles
• Interquartile range

50

• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.

25% 25% 25% 25%
Min Q1 Q2 Q3 Max

51

2 5 8 −3 5 2 6 5 −4
Determine Q1 for the sample of nine measurements:
−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

Q1 is the ( n + 1) ()
1
4
= ( 9 + 1) () 1
4
= 2,5th value

Q1 = −3 + 0,5(2 − (−3)) = −0,5
52

2 5 8 −3 5 2 6 5 −4
Determine Q3 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

Q3 is the ( n + 1) ()
3
4
= ( 9 + 1) ()
3
4
= 7,5th value
Q3 = 5 + 0,5(6 − 5) = 5,5
53

2 5 8 −3 5 2 6 5 −4
Interquartile range = Q3 – Q1
Q3 = 5,5
Q1 = −0,5
Interquartile range
= 5,5 – (−0,5)
=6
54

• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 55

2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

P20 is the ( n + 1) ( ) = ( 9 + 1) ( ) = 2
p
100
20
100
nd
value

P20 = −3
56

Number of Number of
To calculate Q1
calls hours fi F
for the sample of
the 48 hours: [2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
n/4 = 48/4 = 12 [14–under 17) 9 40
frequency ≥ 12 [20–under 23) 2 48 57
n = 48

Q1 Number of Number of
( uQ − lQ ) ( n − FQ −1 )
4
calls hours fi F
= lQ1 + 1 1 1

fQ1 [2–under 5) 3 3
= 8+
( 11 − 8) ( 12 − 7 ) [5–under 8) 4 7
11 [8–under 11) 11 18
= 9,36
[11–under 14) 13 31
than 9,36 or 75% of [17–under 20) 6 46
the time more than
n = 48

Number of Number of
Q3
calls hours fi F
= 3n/4
= 3(48)/4 [2–under 5) 3 3
= 36 [5–under 8) 4 7
frequency ≥ 36 [11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 59
n = 48

Q3 Number of Number of
( uQ − lQ ) ( 34n − FQ −1 )
= lQ3 + 3 3 3
calls hours fi F
f Q3
[2–under 5) 3 3
( 17 − 14 ) ( 36 − 31)
= 14 + [5–under 8) 4 7
9
= 15, 67 [8–under 11) 11 18
[11–under 14) 13 31
than 15,67 or 25% of [17–under 20) 6 46
the time more than
n = 48

Number of Number of
Q3 = 15,67 calls hours fi F
Q1 = 9,36 [2–under 5) 3 3
[5–under 8) 4 7
IRR [8–under 11) 11 18
[11–under 14) 13 31
= 15,67 – 9,36
[14–under 17) 9 40
= 6,31 [17–under 20) 6 46
[20–under 23) 2 48 61
n = 48

telephone calls received for two days at a municipal call
centre. The data was measured per hour.
Number of Number of
P60 calls hours fi F
= np/100
[2–under 5) 3 3
= 48(60)/100
[5–under 8) 4 7
= 28,8
[8–under 11) 11 18
The first cumulative
[11–under 14) 13 31
frequency ≥ 28,8
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 62
n = 48

P60 Number of Number of
( u p − l p ) ( np − Fp −1 ) calls hours fi F
100
= lp +
fp [2–under 5) 3 3
= 11 +
( 14 − 11) ( 28,8 − 18) [5–under 8) 4 7
13 [8–under 11) 11 18
= 13, 49
[11–under 14) 13 31
than 13,49 or 40% of [17–under 20) 6 46
the time more than
n = 48

Classwork/Homework
• Concept Questions 42 – 53, p88 –
Economics

64

BOX-AND-WISKER PLOT
Me = 12,38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15,67
Q1 = 9,36 UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14
IRR = 6,31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
65
• Any value larger than 25,14 will be an outlier.

NORMAL CURVE
• Bell shaped, single peaked and symmetric
• Mean is located at centre of a normal curve
• Total area under a normal curve =1, half of this area on
the left side and half on the right side
• Mean, median and mode are =
• Two tails extend indefinitely to the left and to the right of
the mean as they approach the horizontal axis
• Two tails never touch horizontal axis
• Completely described by its mean and its standard
deviation. Mean specifies position of curve on horizontal
axis, standard deviation specifies the shape of the curve
• Smaller the std dev the less spread out and more
sharply peaked the curve 66

NORMAL CURVE & Empirical Rule
• Chebycheff’s Theorum applies to any dataset
irrespective of the underlying distribution
• Empirical Rule applies specifically to data that follows a
normal curve
• Empirical Rule states that for a normal curve, approx:-
– 68% of observations lie within one std dev of mean
– 95% of observations lie within 2 std dev of mean
– 99.7% of observations lie within 3 std dev of mean

NOTE: FOR A NORMAL CURVE ANY VALUE THAT IS
NOT WITHIN 3 STD DEV OF MEAN IS A SUSPECT
OUTLIER
67

Classwork/Homework
• Concept Questions 61-70, p95 –
Economics

68

Classwork/Homework
1. Activity 1 & 2 – Module Manual p85 – 86
2. Revision Exercises 1,2,3 p 87 -93 Module
Manual
3. Supplementary Exercises questions 1 - 12 p
100 – Elementary Statistics for Business &
Economics
4. Self Review Test p96 - Elementary Statistics
for Business & Economics
5. Read Chapter 4 – Basic Probability, p105 –
150 - Elementary Statistics for Business &
Economics
69

Statistics lecture 4 (ch3)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Semelhante a Statistics lecture 4 (ch3)

Semelhante a Statistics lecture 4 (ch3) (20)

Mais de jillmitchell8778

Mais de jillmitchell8778 (20)

Último

Último (20)

Statistics lecture 4 (ch3)