2. 5/8/2023 Frequency Distributions 2
Stem-and-leaf plots (stemplots)
Analyses start by exploring data with
pictures
My favorite technique is the stemplot: a
histogram-like display of data points
You can observe a lot by looking – Yogi Berra
3. 5/8/2023 Frequency Distributions 3
Illustrative example: sample.sav
A SRS of AGE (in years)
Data as an ordered array (n = 10):
05 11 21 24 27 28 30 42 50 52
Divide each data point into
Stem values first one or two digits
Leaf values next digit
In this example
Stem values tens place
Leaf values ones place
e.g., 21 has a stem value of 2 and leaf value of 1
4. 5/8/2023 Frequency Distributions 4
Stemplot (cont.)
Draw stem-like axis from lowest to highest stem
0|
1|
2|
3|
4|
5|
×10 axis multiplier (important!)
Place leaves next to stem
21 plotted (animation)
1
6. 5/8/2023 Frequency Distributions 6
Interpreting frequency distributions
Central Location
Gravitational center mean
Middle value median
Spread
Range and inter-quartile range
Standard deviation and variance (next week)
Shape
Symmetry
Modality
Kurtosis
7. 5/8/2023 Frequency Distributions 7
Mean = arithmetic average
“Eye-ball method” visualize where plot would balance
Arithmetic method = total divided by n
8
7
4 2
5 1 1 0 2 0
------------
0 1 2 3 4 5
------------
^
Grav.Center
Eye-ball method balances
around 25 to 30
Actual arithmetic average =
29.0
8. 5/8/2023 Frequency Distributions 8
Middle point median
Count from top to
depth of (n + 1) ÷ 2
For illustrative data:
n = 10
Depth of median =
(10+1) ÷ 2 = 5.5
9. 5/8/2023 Frequency Distributions 9
Spread variability
Easiest way to describe spread is
by stating its range, e.g., “from 5 to
52” (not the best way)
A better way is to divide the data
into low groups and high groups
Quartile 1 = median of low group
Quartile 3 = median of high group
10. 5/8/2023 Frequency Distributions 10
Shape visual pattern
Skyline silhouette of
plot
Symmetry
Mounds
Outliers (if any)
When n is small, it’s
too difficult to
describe shape
accurately
X
X
X X
X X X X X X
------------
0 1 2 3 4 5
------------
11. 5/8/2023 Frequency Distributions 11
What to look for in shape
Idealized shape =
density curve
Look for:
General pattern
Symmetry
Outliers
15. 5/8/2023 Frequency Distributions 15
Kurtosis (steepness of peak)
Mesokurtic (medium)
Platykurtic (flat)
Leptokurtic (steep)
skinny tails
fat tails
Kurtosis can NOT be easily judged by eye
16. 5/8/2023 Frequency Distributions 16
Second example (n = 8)
Data: 1.47, 2.06, 2.36, 3.43,
3.74, 3.78, 3.94, 4.42
Truncate extra digit
(e.g., 1.47 1.4)
Stem = ones-place
Leaves = tenths-place
Do not plot decimal
|1|4
|2|03
|3|4779
|4|4
(×1)
Center: between 3.4 & 3.7
(underlined)
Spread: 1.4 to 4.4
Shape: mound, no outliers
17. 5/8/2023 Frequency Distributions 17
Third example (pollution.sav)
Regular stemplot
(top) too squished
Split-stem (bottom)
First 1 on stem
leaves 0 to 4
Second 1 on stem
leaves 5 to 9
Regular stem:
|1|4789
|2|223466789
|3|000123445678
(×1)
Split-stem:
|1|4
|1|789
|2|2234
|2|66789
|3|00012344
|3|5678
(×1)
Note negative skew
18. 5/8/2023 Frequency Distributions 18
How many stem-values?
Start with between 4 and 12 stem-
values
Then, trial and error to draw out shape
for the most informative plot (use
judgment)
20. 5/8/2023 Frequency Distributions 20
Body weight (n = 53)
10|0166
11|009
12|0034578
13|00359
14|08
15|00257
16|555
17|000255
18|000055567
19|245
20|3
21|025
22|0
23|
24|
25|
26|0
(×10)
10|0 means “100”
Shape: Positive skew, high outlier (260)
Location: median = 165 (underlined)
Spread: from 100 to 260
21. 5/8/2023 Frequency Distributions 21
Quintuple split:
Body weight data (n = 53)
1*|0000111
1t|222222233333
1f|4455555
1s|666777777
1.|888888888999
2*|0111
2t|2
2f|
2s|6
(×100)
Codes:
* for leaves 0 and 1
t for leaves two and three
f for leaves four and five
s for leaves six and seven
. for leaves eight and nine
Example:
2t| 2 means a value of 222
(×100)
22. 5/8/2023 Frequency Distributions 22
Frequency counts (SPSS plot)
Frequency Stem & Leaf
2.00 3 . 0
9.00 4 . 0000
28.00 5 . 00000000000000
37.00 6 . 000000000000000000
54.00 7 . 000000000000000000000000000
85.00 8 . 000000000000000000000000000000000000000000
94.00 9 . 00000000000000000000000000000000000000000000000
81.00 10 . 0000000000000000000000000000000000000000
90.00 11 . 000000000000000000000000000000000000000000000
57.00 12 . 0000000000000000000000000000
43.00 13 . 000000000000000000000
25.00 14 . 000000000000
19.00 15 . 000000000
13.00 16 . 000000
8.00 17 . 0000
9.00 Extremes (>=18)
Stem width: 1
Each leaf: 2 case(s)
Age of participants
SPSS provides frequency counts w/ stemplot:
Because of large n, each leaf represents 2
observations
3 . 0 means 3.0 years
23. 5/8/2023 Frequency Distributions 23
Frequency tables
Frequency = count
Relative frequency =
proportion or %
Cumulative
frequency % less
than or equal to
current value
AGE | Freq Rel.Freq Cum.Freq.
------+-----------------------
3 | 2 0.3% 0.3%
4 | 9 1.4% 1.7%
5 | 28 4.3% 6.0%
6 | 37 5.7% 11.6%
7 | 54 8.3% 19.9%
8 | 85 13.0% 32.9%
9 | 94 14.4% 47.2%
10 | 81 12.4% 59.6%
11 | 90 13.8% 73.4%
12 | 57 8.7% 82.1%
13 | 43 6.6% 88.7%
14 | 25 3.8% 92.5%
15 | 19 2.9% 95.4%
16 | 13 2.0% 97.4%
17 | 8 1.2% 98.6%
18 | 6 0.9% 99.5%
19 | 3 0.5% 100.0%
------+-----------------------
Total | 654 100.0%
24. 5/8/2023 Frequency Distributions 24
Class intervals
When data sparse group data into
class intervals
Classes can be uniform or non-uniform
25. 5/8/2023 Frequency Distributions 25
Uniform class intervals
Create 4 to 12 class intervals
Set end-point convention - include left
boundary and exclude right boundary
e.g., first class interval includes 0 and
excludes 10 (0 to 9.99 years of age)
Talley frequencies
Calculate relative frequency
Calculate cumulative frequency (demo)
26. 5/8/2023 Frequency Distributions 26
Here’s age data in sample.sav…
Class Freq Rel. Freq. (%) Cum. Freq (%)
0 – 9.99 1 10 10
10 – 19.99 1 10 20
20 – 29.99 4 40 60
30 – 39.99 1 10 70
40 – 49.99 1 10 80
50 – 59.99 2 20 100
Total 10 100 --
27. 5/8/2023 Frequency Distributions 27
Histogram – for quantitative data
0
1
2
3
4
5
0
-
9
1
0
_
1
9
2
0
-
2
9
3
0
-
3
9
4
0
-
4
9
5
0
-
5
9
Age Class
Bars are contiguous
28. 5/8/2023 Frequency Distributions 28
Bar chart – for categorical data
0
50
100
150
200
250
300
350
400
450
500
Pre- Elem. Middle High
School-level
Bars are discrete