Using the disk that came with your textbook and Excel, access the file
called MOVIES.xls. The path for obtaining this file is Computer > D: / (Or the drive that holds
your CD)>
App B Data Sets > Excel > Movies. Please refer to page 807 for a description of the variables.
Based on
this data set, answer the following questions.
1. Which variables can be classified as quantitative? Which variables can be classified as
qualitative?
2. What is the level of measurement for gross sales (in millions)?
3. Calculate the mean and standard deviation of the Length variable for the 35 movies. (Round to
two decimal places.) Interpret your results.
4. Create a chart of Title vs. Length. Paste the chart into the document and discuss what the chart
is
showing. Are there any outliers? If so, which movie(s)?
5. Create a frequency distribution of the viewer ratings using 5 classes. Your first class should be
0
Solution
We can distinguish between two types of variables according to the level of measurement:
A quantitative variable is one in which the variates differ in magnitude, e.g. income, age, GNP,
etc. A qualitative variable is one in which the variates differ in kind rather than in magnitude,
e.g. marital status, gender, nationality, etc.
Continuous variables can be classified into three categories:
Interval scale data has order and equal intervals. Interval scale variables are measured on a linear
scale, and can take on positive or negative values. It is assumed that the intervals keep the same
importance throughout the scale. They allow us not only to rank order the items that are
measured but also to quantify and compare the magnitudes of differences between them. We can
say that the temperature of 40°C is higher than 30°C, and an increase from 20°C to 40°C is twice
as much as the increase from 30°C to 40°C. Counts are interval scale measurements, such as
counts of publications or citations, years of education, etc.
They occur when the measurements are continuous, but one is not certain whether they are on a
linear scale, the only trustworthy information being the rank order of the observations. For
example, if a scale is transformed by an exponential, logarithmic or any other nonlinear
monotonic transformation, it loses its interval - scale property. Here, it would be expedient to
replace the observations by their ranks.
These are continuous positive measurements on a nonlinear scale. A typical example is the
growth of bacterial population (say, with a growth function AeBt.). In this model, equal time
intervals multiply the population by the same ratio. (Hence, the name ratio - scale).
Ratio data are also interval data, but they are not measured on a linear scale. . With interval data,
one can perform logical operations, add, and subtract, but one cannot multiply or divide. For
instance, if a liquid is at 40 degrees and we add 10 degrees, it will be 50 degrees. However, a
liquid at 40 degrees does not have twice the temperature of a liquid at 20 degrees .
Using the disk that came with your textbook and Excel, access the fi.pdf
1. Using the disk that came with your textbook and Excel, access the file
called MOVIES.xls. The path for obtaining this file is Computer > D: / (Or the drive that holds
your CD)>
App B Data Sets > Excel > Movies. Please refer to page 807 for a description of the variables.
Based on
this data set, answer the following questions.
1. Which variables can be classified as quantitative? Which variables can be classified as
qualitative?
2. What is the level of measurement for gross sales (in millions)?
3. Calculate the mean and standard deviation of the Length variable for the 35 movies. (Round to
two decimal places.) Interpret your results.
4. Create a chart of Title vs. Length. Paste the chart into the document and discuss what the chart
is
showing. Are there any outliers? If so, which movie(s)?
5. Create a frequency distribution of the viewer ratings using 5 classes. Your first class should be
0
Solution
We can distinguish between two types of variables according to the level of measurement:
A quantitative variable is one in which the variates differ in magnitude, e.g. income, age, GNP,
etc. A qualitative variable is one in which the variates differ in kind rather than in magnitude,
e.g. marital status, gender, nationality, etc.
Continuous variables can be classified into three categories:
Interval scale data has order and equal intervals. Interval scale variables are measured on a linear
scale, and can take on positive or negative values. It is assumed that the intervals keep the same
importance throughout the scale. They allow us not only to rank order the items that are
measured but also to quantify and compare the magnitudes of differences between them. We can
say that the temperature of 40°C is higher than 30°C, and an increase from 20°C to 40°C is twice
as much as the increase from 30°C to 40°C. Counts are interval scale measurements, such as
counts of publications or citations, years of education, etc.
They occur when the measurements are continuous, but one is not certain whether they are on a
linear scale, the only trustworthy information being the rank order of the observations. For
example, if a scale is transformed by an exponential, logarithmic or any other nonlinear
2. monotonic transformation, it loses its interval - scale property. Here, it would be expedient to
replace the observations by their ranks.
These are continuous positive measurements on a nonlinear scale. A typical example is the
growth of bacterial population (say, with a growth function AeBt.). In this model, equal time
intervals multiply the population by the same ratio. (Hence, the name ratio - scale).
Ratio data are also interval data, but they are not measured on a linear scale. . With interval data,
one can perform logical operations, add, and subtract, but one cannot multiply or divide. For
instance, if a liquid is at 40 degrees and we add 10 degrees, it will be 50 degrees. However, a
liquid at 40 degrees does not have twice the temperature of a liquid at 20 degrees because 0
degrees does not represent "no temperature" -- to multiply or divide in this way we would have
to use the Kelvin temperature scale, with a true zero point (0 degrees Kelvin = -273.15 degrees
Celsius). In social sciences, the issue of "true zero" rarely arises, but one should be aware of the
statistical issues involved.
There are three different ways to handle the ratio-scaled variables.
Discrete variables are also called categorical variables. A discrete variable, X, can take on a
finite number of numerical values, categories or codes. Discrete variables can be classified into
the following categories:
Nominal variables allow for only qualitative classification. That is, they can be measured only in
terms of whether the individual items belong to certain distinct categories, but we cannot
quantify or even rank order the categories: Nominal data has no order, and the assignment of
numbers to categories is purely arbitrary. Because of lack of order or equal intervals, one cannot
perform arithmetic (+, -, /, *) or logical operations (>, <, =) on the nominal data. Typical
examples of such variables are:
Gender:
1. Male
2. Female
Marital Status:
1. Unmarried
2. Married
3. Divorcee
4. Widower
A discrete ordinal variable is a nominal variable, but its different states are ordered in a
meaningful sequence. Ordinal data has order, but the intervals between scale points may be
uneven. Because of lack of equal distances, arithmetic operations are impossible, but logical
operations can be performed on the ordinal data. A typical example of an ordinal variable is the
socio-economic status of families. We know 'upper middle' is higher than 'middle' but we
3. cannot say 'how much higher'. Ordinal variables are quite useful for subjective assessment of
'quality; importance or relevance'. Ordinal scale data are very frequently used in social and
behavioral research. Almost all opinion surveys today request answers on three-, five-, or seven-
point scales. Such data are not appropriate for analysis by classical techniques, because the
numbers are comparable only in terms of relative magnitude, not actual magnitude.
Consider for example a questionnaire item on the time involvement of scientists in the
'perception and identification of research problems'. The respondents were asked to indicate
their involvement by selecting one of the following codes:
1 = Very low or nil
2 = Low
3 = Medium
4 = Great
5 = Very great
Here, the variable 'Time Involvement' is an ordinal variable with 5 states.
Ordinal variables often cause confusion in data analysis. Some statisticians treat them as nominal
variables. Other statisticians treat them as interval scale variables, assuming that the underlying
scale is continuous, but because of the lack of a sophisticated instrument, they could not be
measured on an interval scale.
A quantitative variable can be transformed into a categorical variable, called a dummy variable
by recoding the values. Consider the following example: the quantitative variable Age can be
classified into five intervals. The values of the associated categorical variable, called dummy
variables, are 1, 2,3,4,5:
[Up to 25]
1
[25, 40 ]
2
[40, 50]
3
[50, 60]
4
[Above 60]
5
Preference variables are specific discrete variables, whose values are either in a decreasing or
increasing order. For example, in a survey, a respondent may be asked to indicate the importance
of the following nine sources of information in his research and development work, by using the
code [1] for the most important source and [9] for the least important source:
4. Note that preference data are also ordinal. The interval distance from the first preference to the
second preference is not the same as, for example, from the sixth to the seventh preference.
Multiple response variables are those, which can assume more than one value. A typical example
is a survey questionnaire about the use of computers in research. The respondents were asked to
indicate the purpose(s) for which they use computers in their research work. The respondents
could score more than one category.
Gender:
1. Male
2. Female
Marital Status:
1. Unmarried
2. Married
3. Divorcee
4. Widower