Frequency Distributions and Data Analyses Chapter

Chapter 3
Frequency Distributions and Data Analyses
Chapter Outline
3.1 Introduction ............................................................................... 65
3.2 Tally Table for Constructing a Frequency Table ......................................... 66
3.3 Three Other Frequency Tables ............................................................ 70
3.4 Graphical Presentation of Frequency Distribution ....................................... 72
3.5 Further Economic and Business Applications ............................................ 82
3.6 Summary .................................................................................. 89
Questions and Problems ........................................................................ 89
Key Terms
Grouped data Histograms
Raw (nongrouped) data Stem-and-leaf displays
Frequency Frequency polygon
Frequency table Cumulative frequency polygon
Frequency distribution Pie chart
Cumulative frequencies Lorenz curve
Relative frequency Gini coefﬁcient
Cumulative relative frequency Absolute inequality
3.1 Introduction
Using the tabular and graphical methods discussed in Chap. 2, we will now develop
two general ways to describe data more fully. We discuss ﬁrst the tally table
approach to depicting data frequency distributions and then three other kinds of
frequency tables. Next, we explore alternative graphical methods for describing
frequency distributions. Finally, we study further applications for frequency
distributions in business and economics.
C.-F. Lee et al., Statistics for Business and Financial Economics,
DOI 10.1007/978-1-4614-5897-5_3, # Springer Science+Business Media New York 2013
65

3.2 Tally Table for Constructing a Frequency Table
Before conducting any statistical analysis, we must organize our data sets. One way
to organize data is by using a tally table as a worksheet for setting up a frequency
table. To set up a tally table for a set of data, we split the data into equal-sized
classes in such a way that each observation ﬁts into one and only one class of
numbers (i.e., the classes are mutually exclusive). Sometimes data are reported in a
frequency table with class intervals given but with actual values of observations in
the classes unknown; data presented in this manner are called grouped data. The
analyst assigns each data point to a class and enters a tally mark made by that class.
Let’s see how this works.
Example 3.1 Tallying Scores from a Statistics Exam. Suppose a statistics professor
wants to summarize how 20 students performed on an exam. Their scores are as
follows: 78, 56, 91, 59, 78, 84, 65, 97, 84, 71, 84, 44, 69, 90, 73, 77, 80, 90, 68, and
75. Data in this form are called nongrouped data or raw data. We can use a tally
table like Table 3.1 to list the number of occurrences, of frequency, of each score.
A corresponding diagram is shown in Fig. 3.1.
This table presents nongrouped data, and no pattern emerges from them. As an
alternative, the data can be grouped into classes by letter grade. If the professor uses
a straight grading scale, the classes might be 90–99, 80–89, 70–79, 60–69, and
below 60. After establishing the classes, the professor counts scores in each class
and records these numbers to obtain a tally sheet, as shown in Table 3.2 and
Fig. 3.2.
Note that each observation is included in one and only one class. The tallies are
counted, and a frequency table is constructed as shown in Table 3.3, where letter
grades are assigned to each class.
Table 3.1 Student exam
scores
Score Tallies Frequency
44 / 1
56 / 1
59 / 1
65 / 1
68 / 1
69 / 1
71 / 1
73 / 1
75 / I
77 / 1
78 // 2
80 / 1
84 /// 3
90 // 2
91 / 1
97 / 1
Total 20
66 3 Frequency Distributions and Data Analyses

Frequency
44 56 59 65 68 69 71 73 75 77 78 80 84 90 91 97
Score
3.0
2.8
2.6
2.4
2.2
2.0
1.8
1.6
1.4
1.2
1.0
.8
.6
.4
.2
0
Fig. 3.1 Bar graph for nongrouped student exam scores given in Table 3.1
Table 3.2 Tally table for
statistics exam scores
Class Tally Frequency
Below 60 /// 3
60–69 /// 3
70–79 ////// 6
80–89 //// 4
90–99 //// 4
20
Frequency
6
5
4
3
2
1
0
Below 60 60-69 70-79 80-89 90-99
Score
Fig. 3.2 Bar graph for grouped student exam scores given in Table 3.2
3.2 Tally Table for Constructing a Frequency Table 67

Example 3.2 A Frequency Distribution of Grade Point Averages. Suppose that
there are 30 students in a classroom and that they have the grade point averages
listed in Table 3.4. A tally table is constructed, in which classes are (arbitrarily)
defined at every half-point and each tally marked next to a particular class accounts
for one data entry. The entries are then counted to obtain a frequency distribution,
as shown in Table 3.5. A frequency distribution simply shows how many
observations fall into each class. We will discuss this concept in further detail in
the next section.
Generally, a data set should be divided into 5–15 classes. Having too few or too
many classes gives too little information. Imagine a frequency distribution with
only two classes: 0.0–2.0 and 2.1–4.0. With such broadly defined classes, it is
difficult to distinguish among GPAs. Similarly, if the class interval were only one-
tenth of a point, the large number of classes, each with only one or a few tallies,
would make summarizing the data almost impossible.
Table 3.3 Frequency table
for statistics exam scores
Class Grade Frequency
Below 60 F 3
60–69 D 3
70–79 C 6
80–89 B 4
90–99 A 4
20
Table 3.4 Student GPAs:
raw data
1.2 3.9 1.9
3.8 2.4 2.7
2.3 2.3 2.6
0.7 3.1 3.7
3.6 2.9 4.0
2.2 2.7 1.2
1.9 0.8 1.8
2.1 0.3 2.4
3.1 3.2 3.2
0.8 3.1 3.6
Table 3.5 Student GPAs:
tally table and frequency
distribution
Range Tallies Frequency
Below 1.5 ////// 6
1.5–1.9 /// 3
2.0–2.4 ////// 6
2.5–2.9 //// 4
3.0–3.4 ///// 5
3.5–4.0 ////// 6
Total 30

In the GPA example, it was relatively easy to construct the classes because GPA
cutoffs were used. However, in most examples, there are no natural dividing lines
between classes. The following guidelines can be used to construct classes:
1. Construct from 5 to 15 classes. This step is the most difficult, because using too
many classes defeats the purpose of grouping the data into classes, whereas
having too few classes limits the amount of information obtained from the data.
As a general rule, when the range and number of observations are large, more
classes can be defined. Fewer classes should be constructed when the number of
observations is only around 20 or 30.
2. Make sure each observation falls into only one class. This can often be accom-
plished by defining class boundaries in terms of several decimal places. If the
percentage return on stocks is carried to one decimal place, for example, then
defining the classes by using two decimal places will ensure that each observa-
tion falls into only one class.
3. Try to construct classes with equal class intervals. This may not be possible,
however, if there are outlying observations in the data set.
Example 3.3 A Frequency Distribution of 3-Month Treasury Bill Rates. Table 3.6
presents another example, and here the data presented are the interest rates on
3-month treasury bills (T-bills) from 1990 to 2009. (T-bills are debt instruments
sold by the US government to finance its budgetary needs.) The annual data for
interest rates (average daily rates for a year) are taken from Economic Report of the
President, January 2009.
As we have noted, a frequency distribution gives the total number of occurrences
in each class. In the next chapter, we will talk about using a frequency distribution
to present data.
By setting up a tally table and a frequency table, we can scrutinize data for
errors. For example, if the data value 123 appears in a column for the rate in the
T-bill example, a mistake has clearly been made – one that could be due to a
missing decimal point. Probably, the data point could be 12.3 % instead, which
makes more sense because it is in the range of the other data points. Data should
also be checked for accuracy. Otherwise, invalid conclusions could be reached.
Table 3.6 T-bill interest
rates, 1990–2009
Class (%) Tallies Frequency
0–1.49 //// 4
1.50–3.49 ///// 5
3.50–5.49 ///////// 9
5.50–6.49 / 1
6.50 and greater / 1
Total 20
3.2 Tally Table for Constructing a Frequency Table 69

3.3 Three Other Frequency Tables
In this section, using the frequency table discussed in the Sect. 3.2, we move ahead
to cumulative frequency tables, relative frequency tables, and relative cumulative
frequency tables.
Example 3.4 Frequency Distributions for Statistics Exam Scores. Suppose that for
the data listed in Table 3.3, the professor wants to know how many students receive
a C or below, the proportion of students who receive a B, and the proportion of
students who receive a D or an F. To obtain this information, she calculates
cumulative, relative, and cumulative relative frequencies.
By constructing cumulative frequencies, the professor determines the number of
students who scored in a particular class or in one of the classes before it (Table 3.7).
Obviously, the cumulative frequency for the first class is the frequency itself (3):
there are no classes before it. The cumulative frequency for the second class is
calculated by taking the frequency in the first class and adding it to the frequency in
the second class (3) to arrive at a cumulative frequency of 6. This means that 6
students were in the first two classes. Then 6 is added to the frequency of the third
class (6) to derive a cumulative frequency of 12. Thus, 12 students scored a C or a
worse grade. The remaining cumulative frequencies are calculated in a similar
manner. Note that the cumulative observation in the last class equals the total
number of sample observations, because all frequencies have occurred in that
class or in previous classes.
Another important concept is the relative frequency, which measures the pro-
portion of observations in a particular class. It is calculated by dividing the
frequency in that class by the total number of observations. For the data
summarized in Table 3.7, the relative frequency for both the first and second classes
is 0.15, and the relative frequencies for the remaining three classes are 0.30, 0.20,
and 0.20, respectively, as shown in Table 3.8. The sum of the relative frequencies
always equals 1.
This table indicates that 15 % of the class received an F, 15 % a D, 30 % a C, and
so on. The professor can calculate the cumulative relative frequency for any class
by adding the appropriate relative frequencies. Cumulative relative frequency
measures the percentage of observations in a particular class and all previous
classes. Thus, if she wants to determine what percentage of the students scored
below a B, our conscientious professor can add the relative frequencies associated
with grades C, D, and F to arrive at 60 %.
Table 3.7 Cumulative frequency table for grade distribution
Class Grade Frequency Cumulative frequency
Below 60 F 3 3
60–69 D 3 6
70–79 C 6 12
80–89 B 4 16
90–99 A 4 20

Example 3.5 Frequency Distributions of Current Ratios for JNJ and MRK. The
current ratios for JNJ and MRK from 1990 to 2009 are shown in Table 3.9.
A frequency distribution for the current ratios of Johnson and Johnson and Merck
is shown in Table 3.10. This ratio is a measure of liquidity, which (as we noted in
Chap. 2) indicates how quickly a firm can obtain cash for operations. The first
column defines the classes. Note that the use here of class boundaries ensures that
each observation will fall into only one class.
The next column shows the frequency – that is, the number of times that an
observation appears in each class. In Table 3.10, we see that JNJ experienced one
current ratio between 1.0 and 1.2, seven between 1.201 and 1.700, and so on. The
third column presents the cumulative frequency. Because there are 20 observations
in the population, the cumulative frequency for the last class is 20.
The fourth column presents the relative frequency, which measures the percentage
of observations in each class. Relative frequencies can be thought of as probabilities.
For example, the probability that an observation is in the first class is 0.1.
Table 3.9 Current ratio for
JNJ and MRK
Year JNJ MRK
1990 1.778 1.332
1991 1.835 1.532
1992 1.582 1.216
1993 1.624 0.973
1994 1.566 1.270
1995 1.809 1.515
1996 1.807 1.600
1997 1.999 1.475
1998 1.364 1.685
1999 1.771 1.285
2000 2.164 1.375
2001 2.296 1.123
2002 1.683 1.199
2003 1.710 1.205
2004 1.962 1.147
2005 2.485 1.582
2006 1.199 1.197
2007 1.510 1.227
2008 1.649 1.348
2009 1.820 1.805
Table 3.8 Relative frequency table for grade distribution
Class Grade Relative frequency Cumulative relative frequency
Below 60 F 0.15 0.15
60–69 D 0.15 0.30
70–79 C 0.30 0.60
80–89 B 0.20 0.80
90–99 A 0.20 1.00
3.3 Three Other Frequency Tables 71

The last column indicates the cumulative relative frequency, which measures the
percentage of observations in a particular class and all previous classes. The
cumulative relative frequency for Merck’s fourth class is calculated by adding
the relative frequencies of the ﬁrst four classes to arrive at 0.95. That is, 95 % of
the observations occur in the ﬁrst four classes. The cumulative relative frequency
of the last class always equals 1, because the last class includes all the observations.
3.4 Graphical Presentation of Frequency Distribution
We have spoken before of the special effectiveness of using graphs to present data.
In this section, we discuss four different graphical approaches to presenting fre-
quency distributions.
3.4.1 Histograms
Frequency distributions can be represented on a variety of graphs. The histogram,
which is one of the most commonly used types, is similar to the bar charts discussed
in Chap. 2 except that
1. Neighboring bars touch each other.
2. The area inside any bar (its height times its width) is proportional to the number
of observations in the corresponding class.
To illustrate these two points, suppose the age distribution of personnel at a
small business is as shown in Table 3.11.
Table 3.10 Frequency distributions of current ratios for JNJ and MRK
Class Frequency
Cumulative
frequency
Relative
frequency
Cumulative relative
frequency
JNJ
1.00–1.2 1 1 0.05 0.05
1.21–1.4 1 2 0.05 0.1
1.41–1.60 3 5 0.15 0.25
1.61–1.80 6 11 0.3 0.55
1.81–2.00 6 17 0.3 0.85
2.01–2.5 3 20 0.15 1.00
Total 20 1.00
MRK
0.81–1.00 1 1 0.05 0.05
1.01–1.2 4 5 0.2 0.25
1.21–1.4 8 13 0.6 0.65
1.41–1.60 5 18 0.25 0.9
1.61–1.80 1 19 0.05 0.95
1.81–2.00 1 20 0.05 1.00
Total 20 1.00

To construct a histogram, we need to enter a scale on the horizontal axis.
Because the data are discrete, there is a gap between the class intervals, say between
20 and 29 and 30–39. In such a case, we will use the midpoint between the end of
one class and the beginning of the next as our dividing point. Between the 20–29
interval and 30–39 interval, the dividing point will be (29 þ 30)/2 ¼ 29.5. We find
the dividing point between the remaining classes similarly.
To satisfy the second condition, we note that all five classes have an interval
width of 10 years. Figure 3.3 is the histogram that reflects these data.
Drawn from the data of Table 3.10, Fig. 3.4a, b are histograms of JNJ’s and
MRK’s current ratios for the years 1990–2009.The x-axis indicates the classes and
the y-axis the frequencies. As the histograms show, MRK’s current ratios have
tended to fall in the 1.0–1.4 range, whereas those of JNJ show no exact pattern, but
many can be found in the 1.61–2.00 range. (In Chap. 4, we will cover measures of
skewness, which give us more insight into the shape of a distribution.)
Table 3.11 Age distribution
of personnel
Class Frequency
20–29 3
30–39 6
40–49 7
50–59 4
60–69 1
70–79 1
Frequency
7
6
5
4
3
2
1
0
19.5 29.5 39.5 49.5 59.5 69.5 79.5
Age
Fig. 3.3 Histogram of age distribution given in Table 3.11
3.4 Graphical Presentation of Frequency Distribution 73

Most standard statistical software packages will construct a histogram from
these data. Using MINITAB, we can specify the class width and the starting class
midpoint, or we can let MINITAB select these values. The output will contain the
frequency distribution as well as a graphical representation in the form of a
histogram (without the bars). MINITAB will provide each class frequency next to
the corresponding class midpoint (not class limits). Figure 3.5a contains the neces-
sary MINITAB commands and the resulting output for the current ratio of MRK
where the class width (CW) and the midpoint of the first class were not specified.
Figure 3.5b specified CW as .2000 and the first midpoint as .905. We can use the
output as it appears or use this information to construct Fig. 3.4b, which is a
graphical representation of MRK’s current ratios as given in Table 3.10.
Fig. 3.4 (a) Frequency histogram of JNJ’s current ratios as given in Table 3.10 (b) Frequency
histogram of MRK’s current ratios as given in Table 3.10

Histograms can also be used to chart the companies’ relative and cumulative
frequencies, as shown in Figs. 3.6 and 3.7. Note the similarity between the frequency
and relative frequency histograms (Figs. 3.4 and 3.6) and between the cumulative
frequency and the relative cumulative frequency graphs (Figs. 3.7 and 3.8); the only
difference between them is the variable on the y-axis. Note also that geometrically,
the relative frequency of each class in a frequency histogram equals its area divided
Data Display
a
b
MRK
1.332 1.532 1.216 0.973 1.27 1.515 1.6 1.475 1.685
1.285 1.375 1.123 1.199 1.205 1.147 1.582 1.197 1.227
1.348 1.805
Histogram of MRK
* NOTE * The character graph commands are obsolete.
Histogram
Histogram of MRK N = 20
Midpoint Count
1.0 1 *
1.1 2 **
1.2 5 *****
1.3 4 ****
1.4 1 *
1.5 3 ***
1.6 2 **
1.7 1 *
Histogram
Histogram of MRK N = 20
Midpoint Count
0.905 1 *
1.105 4 ****
1.305 8 ********
1.505 5 *****
1.705 1 *
1.905 1 *
Fig. 3.5 (a) Histogram using MINITAB, where the class width and the midpoint of the first class
are not specified (b) Histogram using MINITAB using specified classes, where the class width is
0.2000 and the first midpoint is 0.905

by the total area of all the classes. For example, the area for the ﬁrst class for
Merck’s current ratio (Fig. 3.4b) is equal to the base of the bar times its height
(0.19 Â 1 ¼ 0.19), and the sum of all the areas is 3.8. The relative frequency for the
ﬁrst class is thus .19/3.8 ¼ .05.
3.4.2 Stem-and-Leaf Display
An alternative to histograms for the presentation of either nongrouped or grouped
data is the stem-and-leaf display. Stem-and-leaf displays were originally developed
by John Tukey of Princeton University. They are extremely useful in summarizing
data sets of reasonable size (under 100 values as a general rule), and unlike
histograms, they result in no loss of information. By this, we mean that it is possible
Fig. 3.6 (a) Relative frequency histogram of JNJ’s current ratios (b) Relative frequency histo-
gram of MRK’s current ratios

to reconstruct the original data set in a stem-and-leaf display, which we cannot do
when using a histogram.
For example, suppose a financial analyst is interested in the amount of money
spent by food product companies on advertising. He or she samples 40 of these food
product firms and calculates the amount that each spent last year on advertising as a
percentage of its total revenue. The results are listed in Table 3.12.
Let’s use this set of data to construct a stem-and-leaf display. In Fig. 3.9, each
observation is represented by a stem to the left of the vertical line and a leaf to the
right of the vertical line. For example, the stems and leaves for the first three
observations in Table 3.12 can be defined as
Fig. 3.7 (a) Cumulative Frequency histogram of JNJ’s current ratios (b) Cumulative frequency
histogram of MRK’s current ratios

Stem Leaf
12 0.5
8 0.8
11 0.5
In other words, stems are the integer portions of the observations, whereas leaves
represent the decimal portions.
The procedure used to construct a stem-and-leaf display is as follows:
1. Decide how the stems and leaves will be deﬁned.
2. List the stems in a column in ascending order.
Fig. 3.8 (a) Cumulative relative frequency histogram of JNJ’s current ratios (b) Cumulative
relative frequency histogram of MRK’s current ratios

3. Proceed through the data set, placing the leaf for each observation in the
appropriate stem row. (You may want to place the leaves of each stem in
increasing order.)
The percentage of revenues spent on advertising by 40 production firms listed in
Table 3.12 is represented by a stem-and-leaf diagram in Fig. 3.9. From this diagram,
we observe that the minimum percentage of advertising spending is 5.3 % of total
revenue, the maximum percentage of advertising spending is 13.9 %, and the
largest group of firms spends between 9.1 % and 9.8 % of total revenue on
advertising. Also, the 7 leaves in stem row 7 indicate that 7 firms’ advertising
spending is at least 7 % but less than 8 %. The 3 leaves in stem row 13 tell us at a
Table 3.12 Percentage
of total revenue spent
on advertising
Company Percentage Company Percentage
1 12.5 21 6.4
2 8.8 22 7.8
3 11.5 23 8.5
4 9.1 24 9.5
5 9.4 25 11.3
6 10.1 26 8.9
7 5.3 27 6.6
8 10.3 28 7.5
9 10.2 29 8.3
10 7.4 30 13.8
11 8.2 31 12.9
12 7.8 32 11.8
13 6.5 33 10.4
14 9.8 34 7.6
15 9.2 35 8.6
16 12.8 36 9.4
17 13.9 37 7.3
18 13.7 38 9.5
19 9.6 39 8.3
20 6.8 40 7.1
Stems Leaves Frequency
5 1
6 4 5 6 4
7 1 3 4 5 7
8 2 3 3 5 7
9 1 2 4 4 8 8
10 1 2 3
3
8
6 8 8
6 8 9
5 5 6
4 4
11 3 5 8 3
12 5 8 9 3
13 7 8 9 3
Total 40
Fig. 3.9 Stem-and-leaf
display for advertising
expenditure

glance that 3 firms spend more than 13 % of total revenue on advertising. A
MINITAB version of the stem-and-leaf diagram generated by these data is shown
in Fig. 13.10. A stem-and-leaf diagram is presented in the last portion of Fig. 3.10.
In the first column of this diagram, (8) represents the total observation in the middle
group with a stem of 9; 1, 5, 12, and 19 represent the cumulative frequencies from
the first group up to the fourth group; and 3, 6, 9, and 13 represent the cumulative
frequencies from the ninth group up to the sixth group.
3.4.3 Frequency Polygon
A frequency polygon is obtained by linking the midpoints indicated on the x-axis of
the class intervals from a frequency histogram. A cumulative frequency polygon is
derived by connecting the midpoints indicated on the x-axis of the class intervals
from a cumulative frequency histogram. Figures 3.11 and 3.12 show the frequency
polygon and the cumulative frequency polygon, respectively, for JNJ’s current
ratio. Although a histogram does demonstrate the shape of the data, perhaps the
shape can be more clearly illustrated by using a frequency polygon.
Data Display
ADV EXP
12.5 8.8 11.5 9.1 9.4 10.1 5.3 10.3
10.2 7.4 8.2 7.8 6.5 9.8 9.2 12.8
13.9 13.7 9.9 6.8 6.4 7.8 8.5 9.5
11.3 8.9 6.6 7.5 8.3 13.8 12.9 11.8
10.4 7.6 8.6 9.4 7.3 9.5 8.3 7.1
MTB > STEM AND LEAF USING ‘ADV EXP’
Character Stem-and-Leaf Display
Stem-and-leaf of ADV EXP N = 40
Leaf Unit = 0.10
1 5 3
5 6 4568
12 7 1345688
19 8 2335689
(8) 9 12445589
13 10 1234
9 11 358
6 12 589
3 13 789
Fig. 3.10 Stem-and-leaf diagram for advertising expenditure using MINITAB

3.4.4 Pie Chart
Histograms are perhaps the graphical forms most commonly used in statistics, but
other pictorial forms, such as the pie chart, are often used to present ﬁnancial and
marketing data. For example, Fig. 3.13 depicts a family’s sources of income. This
pie chart indicates that 80 % of this family’s income comes from salary.
For data already in frequency form, a pie chart is constructed by converting the
relative frequencies of each class into their respective arcs of a circle. For example,
a pie chart can be used to represent the student grade distribution data originally
presented in Table 3.3. In Table 3.13, the arcs (in degrees) for the ﬁve slices shown
in Fig. 3.14 were obtained by multiplying each relative frequency by 360
.
Fig. 3.11 Frequency polygon of JNJ’s current ratios
Fig. 3.12 Cumulative frequency polygon of JNJ’s current ratios

3.5 Further Economic and Business Applications
3.5.1 Lorenz Curve
The Lorenz curve, which represents a society’s distribution of income, is a cumula-
tive frequency curve used in economics (Fig. 3.15a). The cumulative percentage of
families (ranked by income) is measured on the x-axis, and the cumulative
Fig. 3.13 Sources of family income
Table 3.13 Grade
distribution for 20 students
Class Frequency Relative frequency Arc (degrees)
Below 60 3 0.15 54
60–69 3 0.15 54
70–79 6 0.30 108
80–89 4 0.20 72
90–99 4 0.20 72
Total 20 1.00 360
Fig. 3.14 Grade distribution pie chart

a
Cumulative Percentage
of Family Income
of Families
100
90
80
70
60
50
40
Area I
Area II
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
B
P
N
H
C
AO
of Families
of Family Income
0 10 20 30 40 50 60 70 80 90 100
100
90
80
70
60
50
40
30
20
10
0
b
P
N
HS
Fig. 3.15 (a) and (b) Lorenz curves
3.5 Further Economic and Business Applications 83

percentage of family income received is measured on the y-axis. For example,
suppose there are 100 families, and each earns $100 – that is, the distribution of
income is perfectly equal. The resulting Lorenz curve will be a 45
line (OP),
because the cumulative percentage of families (e.g., 40 %) and the cumulative share
of family income received are always equal.
Now suppose that one family receives 100 % of total family income – that is, the
income distribution is absolutely unequal. The resulting Lorenz curve (ONP)
coincides with the x-axis until point N, where there is a discontinuous jump to
point P. This is because, with the exception of that single family (represented by
point N), each family receives 0 % of total family income. Therefore, these
families’ cumulative share of total family income is also 0 %.
The shape the Lorenz curve is most likely to assume is curve H, which lies
between absolute inequality and equality. This curve indicates that the lowest-
income families, who comprise 40 % of families (point A), receive a disproportion-
ately small share (about 7 %) of total family income (point C). If every family had
the same income, the share going to the lowest 40 % would be represented by point
B (40 %).
Note that with a more equitable distribution of income, the Lorenz curve is less
bowed, or flatter. Curve S in Fig. 3.15b is the Lorenz curve after a progressive
income tax is imposed. Because S is flatter than H (which is reproduced from
Fig. 3.15a), we can conclude that the distribution of income (after taxes) is more
nearly equal than before, as would be expected.
One way to measure the inequality of income from the Lorenz curve is to use the
Gini coefficient.
Gini coefficient for curve H ¼
area I
area ðI þ IIÞ
The Gini coefficient can range from 0 (perfect equality) to 1 (absolute inequality,
wherein one family receives all the income).
Examining Fig. 3.15b reveals that the Gini coefficient will be smaller for curve S
than it is for curve H. In other words, the progressive income tax makes the
distribution of income more nearly equal.
3.5.2 Stock and Market Rate of Return
Table 3.14 presents the frequency tables for the rate of return for Johnson and
Johnson, Merck, and the stock market overall. (The data are drawn from Table 2.4
in Appendix 2 of Chap. 2.) Because the two firms have similar frequency
distributions, we can conclude that the performances of Johnson and Johnson and
Merck’s stocks have been similar. However, Johnson and Johnson’s highest class is

0.001–0.200, while Merck’s highest classes are spread but found at À0.200 and
below and at 0.201–0.400.
The stock market’s overall lowest class was found at À0.200 and below, but its
highest class was only 0.001–0.200. Thus, the overall market has fluctuated less
than the return of the two pharmaceutical firms. And although Johnson and Johnson
and Merck have a higher top class, the market suffered through fewer negative
returns. Moreover, Johnson and Johnson and Merck had 9 and 8 years, respectively,
of losses, while the market had only five. In other words, the pharmaceutical firms
offered the potential of higher returns but also threatened the investor with a greater
risk of loss.
3.5.3 Interest Rates
Histograms can be used to summarize movements in such interest rates as the prime
rate and the treasury bill rate. The prime rate is the interest rate that banks charge to
their best customers; treasury bills are short-term debt instruments issued by the US
Table 3.14 Rates of return for JNJ and MRK stock and the SP 500
Class
Frequency
(years)
Cumulative
frequency
Relative
frequency
Cumulative relative
frequency
JNJ
À0.200 and below 4 4 0.1905 0.1905
À0.199 to 0.000 5 9 0.2381 0.4286
0.001–0.200 5 14 0.2381 0.6667
0.201–0.400 5 19 0.2381 0.9048
0.401–0.600 1 20 0.0476 0.9524
0.601–1.00 1 21 0.0476 1.0000
Total 21 1.000
MRK
À0.200 and below 5 5 0.2381 0.2381
À0.199 to 0.000 3 8 0.1429 0.3810
0.001–0.200 3 11 0.1429 0.5238
0.201–0.400 5 16 0.2381 0.7619
0.401–0.600 3 19 0.1429 0.9048
0.601–1.00 2 21 0.0952 1.0000
Total 21 1.000
SP 500 (market)
À0.200 and below 1 1 0.0476 0.0476
À0.199 to 0.000 4 5 0.1905 0.2381
0.001–0.200 11 16 0.5238 0.7619
0.201–0.400 5 21 0.2381 1.0000
Total 21 1.000

government. Let us examine how these rates have moved over the period
1990–2009, as shown in Table 3.15.
As can be seen in Table 3.16 and Fig. 3.16, the prime rate is skewed to the right,
with 65 % of the observations appearing in the ranges made up of the slightly higher
midrange interest rates (6–6.9 %, 7–7.9 %, and 8–8.9 %). If you were to predict a
future value for the prime rate, your best guess would be in the 6–9 % range. This
wide range would probably not be of much use. Better methods for prediction, such
as multiple regression and time series analysis, will be discussed later (Chaps. 15
and 18).
Table 3.15 3-Month T-bill
rate and prime rate
(1990–2009)
Year 3-Month T-bill rate Prime rate
90 7.49 10.01
91 5.38 8.46
92 3.43 6.25
93 3.00 6.00
94 4.25 7.14
95 5.49 8.83
96 5.01 8.27
97 5.06 8.44
98 4.78 8.35
99 4.64 7.99
00 5.82 9.23
01 3.39 6.92
02 1.60 4.68
03 1.01 4.12
04 1.37 4.34
05 3.15 6.19
06 4.73 7.96
07 4.35 8.05
08 1.37 5.09
09 0.15 3.25
Table 3.16 Frequency distributions of interest rates
T-bill Prime rate
Class (%) Frequency Relative frequency Frequency Relative frequency
0–1.99 0 0.00 5 0.25
2–2.99 0 0.00 0 0.00
3–3.99 1 0.05 4 0.20
4–4.99 3 0.15 5 0.25
5–5.99 1 0.05 5 0.25
6–6.99 4 0.20 0 0.00
7–7.99 3 0.15 1 0.05
8–8.99 6 0.30 0 0.00
9–9.99 1 0.05 0 0.00
10–10.99 1 0.05 0 0.00
Total 20 1.00 20 1.00

The frequency table for the treasury bill rate is shown in Table 3.16. This
distribution, like that of the prime rate, is skewed to the right. Fifty percent of the
observations appear in the third and fourth classes, 4–4.9 % and 5–5.9 %. This
distribution is depicted in the histogram shown in Fig. 3.17.
If you were to make a prediction of the treasury bill rate, it would probably be in
the 3–6 % range. Again, better methods for predicting observations will be
discussed later.
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
1 - 2.9
Frequency
3.0 - 4.9 5.0 - 6.9
Interest Rates
7.0 - 8.9 9.0 - 11
Fig. 3.16 Frequency histogram of prime lending rates given in Table 3.15
6.00
5.00
4.00
3.00
2.00
1.00
0.00
0 - 1.9 2 - 2.9 3 - 3.9 4 - 4.9
Interest Rate
Frequency
5 - 5.9 6 - 6.9 7 - 7.9
Fig. 3.17 Frequency histogram of T-bill rates given in Table 3.15

3.5.4 Quality Control
Figure 3.18 depicts the quality control data on electronic parts given in Table 3.17.
This control chart shows the percentage of defects for each sample lot. Figure 3.18
indicates that both lots 5 and 7 have exceeded the allowed maximum defect level of
3 %. Therefore, the product quality in these two lots should be improved.
Percentage Defective
4
3
2
1
0
1 2 3 4 5 6 7 8
Lot Number
Fig. 3.18 Frequency bar graph of the percentage of defects for each sample lot
Table 3.17 Quality control
report on electronic parts
Sample Lot Sample Defects Percentage
1 1,000 15 1.5
2 1,000 20 2.0
3 1,000 17 1.7
4 1,000 25 2.5
5 1,000 35 3.5
6 1,000 20 2.0
7 1,000 36 3.6
8 1,000 28 2.8
Total 8,000 196 2.45 (mean)

3.6 Summary
In this chapter, we extended the discussion of Chap. 2 by showing how data can be
grouped to make analysis easier. After the data are grouped, frequency tables,
histograms, stem-and-leaf displays, and other graphical techniques are used to
present them in an effective and memorable way.
Our ultimate goal is to use a sample to make inferences about a population.
Unfortunately, neither the tabular nor the graphical approach lends itself to mea-
suring the reliability of an inference in data analysis. To do this, we must develop
numerical measures for describing data sets. Therefore, in the next chapter, we
show how data can be described by the use of descriptive statistics such as the
mean, standard deviation, and other summary statistical measures.
Questions and Problems
1. Explain the difference between grouped and nongrouped data.
2. Explain the difference between frequency and relative frequency.
3. Explain the difference between frequency and cumulative frequency.
4. Carefully explain how the concept of cumulative frequency can be used to form
the Lorenz curve.
5. Suppose you are interested in constructing a frequency distribution for the
heights of 80 students in a class. Describe how you would do this.
6. What is a frequency polygon? Why is a frequency polygon useful in data
presentation?
7. Use the prime rate data given in Table 3.6 in the text to construct cumulative
frequency and cumulative relative frequency tables.
8. Use the percentage of total revenue spent on advertising listed in Table 3.12
of the text to draw a frequency polygon and a cumulative frequency polygon.
9. On November 17, 1991, the Home News of central New Jersey used the bar
chart given here to show that foreign investors are taxed at a lower rate than the
US citizens.
(a) Construct a table to show frequency, relative frequency, and cumulative
frequency.
(b) Draw a frequency polygon and a cumulative frequency polygon.
Questions and Problems 89

Foreign investors get big tax breaks
on money made in the U.S.
1988 tax rates, in percent.
Rate for
middle-
income* U.S.
taxpayers:
10.7%
. . . If taxed at the same
rate. granted Kuwaiti
investors in the U.S..
the American would
have paid just
$454
An American taxpayer
earning $30,000 to
$40,000 paid
$3,708 . . .
Japan6.06
5.63
5.21
4.64
4.39
4.39
3.95
3.67
2.50
2.24
1.97
1.86
1.29
1.02
.76
.14
CaymanIslands
Spain
GreatBritain
Canada
France
Italy
Netherlands
SaudiArabia
Belgium
Kuwait
Neth.Antilles
Taiwan
Singapore
Finland
UnitedArabEmirates
Source: Philadelphia Inquirer, Internal Revenue Service.
*$30,000
to $40,000
Source: Home News, November 17, 1991, Reprinted by permission of Knighi-
Ridder Tribune News
10. Use the EPS and DPS data given in Table 2.3 in Chap. 2 to construct frequency
distributions.
11. Use the data from question 10 to construct a relative frequency graph and a
cumulative relative frequency graph for both EPS and DPS.
12. On November 17, 1991, the Home News of central New Jersey used the bar
chart in the accompanying ﬁgure to show the 1980–1991 passenger trafﬁc
trends for Newark International Airport.
(a) Use these data to draw a line chart and interpret your results.
(b) Use these data to draw a stem-and-leaf diagram and interpret your results.

30
Passengers (in millions)
25
20
15
10
5
0
80 81 82
9.2
10.2
12.0
17.4
23.6
28.8
29.4
23.4
22.4
20.9
22.3
83 84 85 86 87 88 89 90
Year
Source: Port Authority of NY and NJ
91
Source: Home News, November 17, 1991. Reprinted by permission of the
publisher
13. An advertising executive is interested in the age distribution of the subscribers
to Person magazine. The age distribution is as follows:
Age Number of subscribers
18–25 10,000
26–35 25,000
36–45 28,000
46–55 19,000
56–65 10,000
Over 65 7,000
(a) Use a frequency distribution graph to present these data.
(b) Use a relative frequency distribution to present these data.
14. Use the data from question 13 to produce a cumulative frequency graph and a
cumulative relative frequency graph.

15. Construct stem-and-leaf displays for the 3-month T-bill rate and the prime rate,
using the data listed in Table 3.15.
Use the goaltenders’ salaries for the 1991 NHL season given in the following
table to answer questions 16–20.
Name Team Gross salary
Patrick Roy Montreal Canadiens $1.056Ma
Ed Belfour Chicago Blackhawks $925,000
Ron Hextall Philadelphia Flyers $735,000
Mike Richter New York Rangers $700,000
Kelly Hrudey Los Angeles Kings $550,000
Mike Liut Washington Capitals $525,000
Mike Vernon Calgary Flames $500,000
Grant Fuhr Toronto Maple Leafs $424,000
John Vanbiesbrouck New York Rangers $375,000
Ken Wregget Philadelphia Flyers $375,000
Tom Barrasso Pittsburgh Penguins $375,000
a
Roy’s salary is $500,000 Canadian, and $700,000 Canadian deferred. The salary listed is US
equivalent
16. Group the data given in the table into the following groups: $351,000–400,000;
401,000–450,000; 451,000–500,000; 501,000–550,000; 551,000–600,000;
601,000–650,000; 651,000–700,000; over 701,000.
17. Use your results from question 16 to construct a cumulative frequency table.
18. Use your results from question 16 to construct a relative frequency table and a
cumulative relative frequency table.
19. Use a bar graph to plot the frequency distribution.
20. Use a bar graph to plot the cumulative relative frequency.
21. Briefly explain why the Lorenz curve shown in Fig. 3.15b has the shape it does.
22. The students in an especially demanding history class earned the following
grades on the midterm exam: 86, 75, 92, 98, 71, 55, 63, 82, 94, 90, 80, 62, 62,
65, and 68. Use MINITAB to draw a stem-and-leaf graph of these grades.
23. Use the data given in question 22 to construct a tally table for the grades. Use
intervals 51–60, 61–70, 71–80, 81–90, and 91–100.
24. Construct a cumulative frequency table for the tally table you constructed in
question 23.
25. Use the data in question 24 to graph the cumulative frequency on a bar chart by
using Microsoft Excel.
26. Suppose the Gini coefficient in some country were equal to 0. What would that
tell us about income in this country?
27. Suppose the Gini coefficient in another country were equal to 1. What would
that tell us about income in this country?
Use the following information to answer questions 28–34. Suppose Weight
Watchers has collected the following weight loss data, in pounds, for 30 of its
clients.
15, 20, 10, 6, 8, 18, 32, 17, 19, 7, 9, 12, 14, 9, 25, 18, 21, 3, 2, 18, 12, 15, 14,
28, 34, 30, 18, 12, 11, 8

28. Construct a tally table for weight loss. Use 5-lb intervals beginning with 1–5 lb,
6–10 lb, etc.
29. Construct a cumulative frequency table for weight loss.
30. Construct a frequency histogram for weight loss using MINITAB.
31. Construct a frequency polygon for weight loss.
32. Construct a table for the relative frequencies and the cumulative relative
frequencies.
33. Graph the relative frequency.
34. Graph the cumulative relative frequency.
35. The following graph shows the Lorenz curves for two countries, Modestia and
Richardonia. Which country has the most nearly equal distribution of income?
of Family Income
of Families
Modestia
Richardonia
Use the following information to answer questions 36–41. Suppose a class of
high school seniors had the following distribution of SAT scores in English.
SAT score Number of students
401–450 8
451–500 10
501–550 15
551–600 6
601–650 4
651–700 1
36. Construct a cumulative frequency table.
37. Use a histogram to graph the cumulative frequencies.
38. Construct a frequency polygon.
39. Compute the relative frequencies and the cumulative relative frequencies.
40. Construct a relative frequency histogram.
41. Construct a cumulative relative frequency histogram.
Use the following prices of Swiss stocks to answer questions 42 through 49.

Switzerland (in Swiss francs) Close Prev. close
1. Alusuisse 976 982
2. Brown Boveri 3,960 4,080
3. Ciba-Geigy br 3,190 3,240
4. Ciba-Geigy reg 3,080 3,110
5. Ciba-G ptc ctf 3,020 3,040
6. CS Holding 1,920 1,915
7. Hof LaRoch br 8,280 8,300
8. Roce div rt 5,360 5,330
9. Nestle bearer 8,420 8,450
10. Nestle reg 8,310 8,310
11. Nestle ptc ctf 1,570 1,585
12. Sandoz 2,390 2,410
13. Sulzer 465 470
14. Swiss Bank Cp 301 299
15. Swiss Reinsur 2,520 2,530
16. Swissair 667 680
17. UBS 3,230 3,230
18.Winterthur 3,390 3,420
19. Zurich Ins 4,080 4,090
Source: Wall Street Journal, November 1, 1991
42. Construct a tally table for the closing stock prices “Close” column. Use 1,000-
point intervals beginning with 301–1,300, 1,301–2,301, etc.
43. Compute the change in prices by subtracting the previous closing price from
the current closing price.
44. Use your answer to question 43 to construct a tally table. Use 30-point intervals
beginning with À120 ~ À91,–90 ~ À61, etc.
45. Use your answer to question 44 to compute the cumulative frequencies.
46. Use your answer to question 44 to compute the relative and cumulative relative
frequencies.
47. Use your answer to question 46 to graph the relative frequency.
48. Use your answer to question 46 to graph the cumulative frequency.
49. Create a frequency polygon using data from question 44.
50. Draw the stem-and-leaf display of DPS of JNJ and Merck during the period
1988–2009 using Table 2.3, in which data on EPS, DPS, and PPS for JNJ,
Merck, and SP 500 during the period 1988–2009 are given.
51. Refer to Table 2.5, in which the balance sheet of JNJ company for the year 2008
and 2009 are given. Draw the pie chart of the composition of the total current
asset of JNJ for the year 2008 and 2009, respectively.
Using Table 2.8, in which the 7 ﬁnancial ratios of JNJ and Merck during the
period 1990–2009 are given.
52. Construct a frequency, cumulative frequency, and relative frequency table for
the “price–earnings ratio” (PER) of the JNJ company using class boundaries:
À20.000 PER 0.000, 0.000 PER 5.000, 5.000 PER 10.000,
10.000 PER 15.000, 15.000 PER 20.000, 20.000 PER 27.000.
53. Draw the histogram and frequency polygon of the above frequency distribution.

Frequency Distributions and Data Analyses Chapter

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Frequency Distributions and Data Analyses Chapter

Semelhante a Frequency Distributions and Data Analyses Chapter (20)

Mais de Springer

Mais de Springer (20)

Último

Último (20)

Frequency Distributions and Data Analyses Chapter