One of the complexities for many undergraduate students and for first time researchers is ‘How to blend their socialization with the systematic rigours of scientific inquiry?’ For some, the socialization process would have embedded in them hunches, faith, family authority and even ‘hearsay’ as acceptable modes of establishing the existence of certain phenomena. These are not principles or approaches rooted in academic theorizing or critical thinking. Despite insurmountable scientific evidence that have been gathered by empiricism, the falsification of some perspectives that students hold are difficulty to change as they still want to hold ‘true’ to the previous ways of gaining knowledge. Even though time may be clearly showing those issues are obsolete or even ‘mythological’, students will always adhere to information that they had garnered in their early socialization. The difficulty in objectivism is not the ‘truths’ that it claims to provide and/or how we must relate to these realities, it is ‘how do young researchers abandon their preferred socialization to research findings? Furthermore, the difficulty of humans and even more so upcoming scholars is how to validate their socialization with research findings in the presence of empiricism.
In writing this book, I tried not to assume that readers have grasped the intricacies of quantitative data analysis as such I have provided the apparatus and the solutions that are needed in analyzing data from stated hypotheses. The purpose for this approach is for junior researchers to thoroughly understand the materials while recognizing the importance of hypothesis testing in scientific inquiry.
1. A Simple Guide to the Analysis of
Quantitative Data
An Introduction with hypotheses,
illustrations and references
By
Paul Andrew Bourne
2. A Simple Guide to the Analysis of
Quantitative Data: An Introduction with
hypotheses, illustrations and references
By
Paul Andrew Bourne
Health Research Scientist, the University of the West Indies,
Mona Campus
Department of Community Health and Psychiatry
Faculty of Medical Sciences
The University of the West Indies, Mona Campus, Kingston, Jamaica
2
4. Table of Contents
Page
Preface 8
Menu bar – Contents of the Menu bar in SPSS 11
Function - Purposes of the different things on the menu bar 12
Mathematical symbols (numeric operations), in SPSS 13
Listing of Other Symbols 14
The whereabouts of some SPSS functions, or commands 16
Disclaimer 19
Coding Missing Data 20
Computing Date of Birth 21
List of Figures 26
List of Tables 29
How do I obtain access to the SPSS PROGRAM? 35
1. INTRODUCTION ……………………………………………………………........ 43
1.1.0a: steps in the analysis of hypothesis…………………………………… 45
1.1.1a Operational definitions of a variable………………………………… 47
1.1.1b Typologies of variable ………………..………………………………. 49
1.1.1 Levels of measurement………..………………………………………... 50
1.1.3 Conceptualizing descriptive and inferential statistics ……………….. 59
2. DESCRIPTIVE STATISTICS ANALYZED ….……………………………........ 62
2.1.1 Interpreting data based on their levels of measurement………..……. 64
2.1.2 Treating missing (i.e. non-response) cases…………………….………. 84
3. HYPOTHESES: INTRODUCTION …………………………….………………. 87
3.1.1 Definitions of Hypotheses………………..……..………………………. 88
3.1.2: Typologies of Hypothesis……………………………………………… 89
3.1.3: Directional and non-Directional Hypotheses………………………….. 90
3.1.4 Outliers (i.e. skewness)…………………………….……………………. 91
3.1.5 Statistical approaches for treating skewness…………….……………… 93
4. Hypothesis 1…[using Cross tabulations and Spearman ranked ordered correlation]
……………………………………………………….. 96
A1. Physical and social factors and instructional resources will directly influence the
academic performance of students who will write the Advanced Level Accounting
Examination;
A2. Physical and social factors and instructional resources positively influence the
academic performance of students who write the Advanced level Accounting
examination and that the relationship varies according to gender;
4
5. B1. Pass successes in Mathematics, Principles of Accounts and English Language at the
Ordinary/CXC General level will positively influence success on the Advanced level
Accounting examination;
B2. Pass successes in Mathematics, Principles of Accounts and English Language at the
Ordinary.
5. Hypothesis 2…………[using Crosstabulations]..…………………………….. 152
There is a relationship between religiosity, academic performance, age and marijuana
smoking of Post-primary schools students and does this relationship varies based on
gender.
6. Hypothesis 3……….…..…[Paired Sample t-test]…….……………………… 164
There is a statistical difference between the pre-Test and the post-Test scores.
7. Hypothesis 4….………[using Pearson Product Moment Correlation]…..…........ 184
Ho: There is no statistical relationship between expenditure on social programmes (public
expenditure on education and health) and levels of development in a country; and
H1: There is a statistical association between expenditure on social programmes (i.e.
public expenditure on education and health) and levels of development in a country
8. Hypothesis 5….. ………[using Logistic Regression]…………………………........ 199
The health care seeking behaviour of Jamaicans is a function of educational level,
poverty, union status, illnesses, duration of illnesses, gender, per capita consumption,
ownership of health insurance policy, and injuries. [ Health Care Seeking Behaviour =
f( educational levels, poverty, union status, illnesses, duration of illnesses, gender, per
capita consumption, ownership of health insurance policy, injuries)]
9. Hypothesis 6….. ……[using Linear Regression] ….………………………….. 207
There is a negative correlation between access to tertiary level education and
poverty controlled for sex, age, area of residence, household size, and educational level
of parents
10. Hypothesis 7….. ……[using Pearson Product Moment Correlation Coefficient and
Crosstabulations]………………………....................... 223
There is an association between the introduction of the Inventory Readiness Test and
the Performance of Students in Grade 1
5
6. 11. Hypothesis 8….…………[using Spearman rho]……………………………….... 232
The people who perceived themselves to be in the upper class and middle class are
more so than those in the lower (or working) class do strongly believe that acts of
incivility are only caused by persons in garrison communities
12. Hypothesis 9………………………………………………………………........ 235
Various cross tabulations
13. Hypothesis 10………[using Pearson and Crosstabulations]………………........ 249
There is no statistical difference between the typology of workers in the construction
industry and how they view 10-most top productivity outcomes
14. Hypothesis 11….…[using Crosstabulations and Linear Regression]……........ 265
Determinants of the academic performance of students
15. Hypothesis 12….……[using Spearman ranked ordered correlation]…........ 278
People who perceived themselves to be within the lower social status (i.e. class) are
more likely to be in-civil than those of the upper classes.
16. Data Transformation…………………………………………………........ 281
Recoding 291
Dummying variables 309
Summing similar variables 331
Data reduction 340
Glossary……………..….. ………………………………………………………........ 350
Reference…..………….…………………………………………………………........ 352
Appendices…………..….. ………………………………………………………........ 356
Appendix 1- Labeling non-responses 356
6
7. Appendix 2- Statistical errors in data 357
Appendix 3- Research Design 359
Appendix 4- Example of Analysis Plan 366
Appendix 5- Assumptions in regression 367
Appendix 6- Steps in running a bivariate cross tabulation 368
Appendix 7- Steps in running a trivariate cross tabulation 380
Appendix 8- What is placed in a cross tabulations table, using the above SPSS output
394
Appendix 9- How to run a Regression in SPSS 395
Appendix 10- Running Regression in SPSS 396
Appendix 11a- Interpreting strength of associations 407
Appendix 11b - Interpreting strength of association 408
Appendix 12- Selecting cases 409
Appendix 13- ‘UNDO’ selecting cases 417
Appendix 14- Weighting cases 420
Appendix 15- ‘Undo’ weighting cases 429
Appendix 15- Statistical symbolisms 440
Appendix 16 – Converting from ‘string’ to ‘numeric’ data –
Apparatus One – Converting from string data to numeric data 443
Apparatus Two – Converting from alphabetic and numeric data
to all ‘numeric data 447
Appendix 17- Steps in running Spearman rho 454
Appendix 18- Steps in running Pearson’s Product Moment Correlation 459
Appendix 19-Sample sizes and their appropriate sampling error 464
Appendix 20 – Calculating sample size from sampling error(s) 465
Appendix 21 – Sample sizes and their sampling errors 467
Appendix 22 - Sample sizes and their sampling errors 468
Appendix 23 – If conditions 469
Appendix 24 – The meaning of ρ value 477
Appendix 25 – Explaining Kurtosis and Skewness 478
Appendix 26 – Sampled Research Papers 479-560
7
8. PREFACE
One of the complexities for many undergraduate students and for first time researchers is ‘How
to blend their socialization with the systematic rigours of scientific inquiry?’ For some, the
socialization process would have embedded in them hunches, faith, family authority and even
‘hearsay’ as acceptable modes of establishing the existence of certain phenomena. These are
not principles or approaches rooted in academic theorizing or critical thinking. Despite
insurmountable scientific evidence that have been gathered by empiricism, the falsification of
some perspectives that students hold are difficulty to change as they still want to hold ‘true’ to
the previous ways of gaining knowledge. Even though time may be clearly showing those
issues are obsolete or even ‘mythological’, students will always adhere to information that they
had garnered in their early socialization. The difficulty in objectivism is not the ‘truths’ that it
claims to provide and/or how we must relate to these realities, it is ‘how do young researchers
abandon their preferred socialization to research findings? Furthermore, the difficulty of
humans and even more so upcoming scholars is how to validate their socialization with
research findings in the presence of empiricism.
Within the aforementioned background, social researchers must understand that ethic
must govern the reporting of their findings, irrespective of the results and their value systems.
Ethical principles, in the social or natural research, are not ‘good’ because of their inherent
construction, but that they are protectors of the subjects (participants) from the researcher(s)
who may think the study’s contribution is paramount to any harm that the interviewees may
suffer from conducting the study. Then, there is the issue of confidentiality, which sometimes
might be conflicting to the personal situations faced by the researcher. I will be simplistic to
suggest that who takes precedence is based on the code of conduct that guides that profession.
Hence, undergraduate students should be brought into the general awareness that findings must
be reported without any form of alteration. This then give rise to ‘how do we systematically
investigate social phenomena?’
The aged old discourse of the correctness of quantitative versus qualitative research
will not be explored in this work as such a debate is obsolete and by rehashing this here is a
pointless dialogue. Nevertheless, this textbook will forward illustrations of how to analyze
quantitative data without including any qualitative interpretation techniques. I believe that the
problems faced by students as how to interpret statistical data (ie quantitative data), must be
addressed as the complexities are many and can be overcome in a short time with assistance.
My rationale for using ‘hypotheses’ as the premise upon which to build an analysis is
embedded in the logicity of how to explore social or natural happenings. I know that
hypothesis testing is not the only approach to examining current germane realities, but that it is
one way which uses more ‘pure’ science techniques than other approaches.
Hypothesis testing is simply not about null hypothesis, Ho (no statistical relationships),
or alternative hypothesis, Ha, it is a systematic approach to the investigation of observable
phenomenon. In attempting to make undergraduate students recognize the rich annals of
hypothesis testing and how they are paramount to the discovery of social fact, I will
8
9. recommend that we begin by reading Thomas S. Kuhn (the Scientific Revolution), Emile
Durkheim (study on suicide), W.E.B. DuBois (study on the Philadelphian Negro) and the
works of Garth Lipps that clearly depict the knowledge base garnered from their usage.
In writing this book, I tried not to assume that readers have grasped the intricacies of
quantitative data analysis as such I have provided the apparatus and the solutions that are
needed in analyzing data from stated hypotheses. The purpose for this approach is for junior
researchers to thoroughly understand the materials while recognizing the importance of
hypothesis testing in scientific inquiry.
Paul Andrew Bourne, Dip Ed, BSc, MSc, PhD
Health Research Scientist
Department of Community Health and Psychiatry
Faculty of Medical Sciences
The University of the West Indies
Mona-Jamaica.
9
10. ACKNOWLEDGEMENT
This textbook would not have materialized without the assistance of a number of people
(scholars, associates, and students) who took the time from their busy schedule to guide,
proofread and make invaluable suggestions to the initial manuscript. Some of the individuals
who have offered themselves include Drs. Ikhalfani Solan, Samuel McDaniel and Lawrence
Nicholson who proofread the manuscript and made suggestions as to its appropriateness,
simplicities and reach to those it intend to serve. Furthermore, Mr. Maxwell S. Williams is
very responsible for fermenting the idea in my mind for a book of this nature. Special thanks
must be extended to Mr. Douglas Clarke, an associate, who directed my thoughts in time of
frustration and bewilderment, and on occasions gave me insight on the material and how it
could be made better for the students.
In addition, I would like to extend my heartiest appreciation to Professor Anthony
Harriott and Dr. Lawrence Powell both of the department of Government, UWI, Mona-
Jamaica, who are my mentors and have provided me with the guidance, scope for the material
and who also offered their expert advice on the initial manuscript.
Also, I would like to take this opportunity to acknowledge all the students of
Introduction to Political Science (GT24M) of the class 2006/07 who used the introductory
manuscript and made their suggestions for its improvement, in particular Ms. Nina Mighty.
10
11. Menú Bar
Content:
A social researcher should not only be cognizant of statistical techniques and modalities of
performing his/her discipline, but he/she needs to have a comprehensive grasp of the various
functions within the ‘menu’ of the SPSS program. Where and what are constituted within the
‘menu bar’; and what are the contents’ functions?
‘Menu bar’ contains
the following:
- File
- Edit
- View
- Data
- Transform
- Analyze
- Graph
- Utilities
- Add-ons
- Window
- Help
The functions of the various contents of the
‘menu bar’ are explored overleaf
Box 1: Menu Function
11
12. Menu Bar
Functions: Purposes of the different things on the menu bar
File – This icon deals with the different functions associated with files such as (i) opening ..,
(ii) reading …, (iii) saving …, (iv) existing.
Edit – This icon stores functions such as – (i) copying, (ii) pasting, (iii) finding, and (iv)
replacing.
View – Within this lie functions that are screen related.
Data – This icon operates several functions such as – (i) defining, (ii) configuring, (iii)
entering data, (iv) sorting, (v) merging files, (vi) selecting and weighting cases, and
(vii) aggregating files.
Transform – Transformation is concerned with previously entered data including (i) recoding,
(ii) computing, (iii) reordering, and (vi) addressing missing cases.
Analyze – This houses all forms of data analysis apparatus, with a simply click of the Analyze
command.
Graph – Creation of graphs or charts can begin with a click on Graphs command
Utilities – This deals with sophisticated ways of making complex data operations easier, as
well as just simply viewing the description of the entered data
12
13. MATHEMATICAL SYMBOLS (NUMERIC OPERATIONS), in SPSS
NUMERIC OPERATIONS FUNCTIONS
+ Add
- Subtract
* Multiply
/ Divide
** Raise to a power
() Order of operations
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
= Equal
~= Not equal to
& and: both relations must be true
I Or: either relation may be true
~ Negation: true between false, false
become true
Box 2: Mathematical symbols and their Meanings
13
14. LISTING OF OTHER SYMBOLS
SYMBOLS MEANINGS
YRMODA (i.e. yr. month, day) Date of birth (e.g. 1968, 12, 05)
a Y intercept
b Coefficient of slope (or regression)
f frequency
n Sample size
N Population
R Coefficient of correlation, Spearman’s
r Coefficient of correlation , Pearson
Sy Standard error of estimate
W ot Wt Weight
µ Mu or population mean
β Beta coefficient
3 or χ Measure of skewness
∑ summation
σ Standard deviation
χ2 Chi-Square or chi square, this is the
value use to test for goodness of fit
CC Coefficient of Contingency
fa Frequency of class interval above
modal group
fb Frequency of class interval below
modal group
X A single value or variable
_ Adjusted r, which is the coefficient of
R correlation corrected for the number
of cases
_ _ Arithmetic mean of X or Y
X or Y
RND Round off to the nearest integer
SYSMIS This denotes system-missing values
MISSING All missing values
Type I Error Claiming that events are related (or
means are different when they are not
Type II Error This assumes that events (or means
are not different) when they are
Φ Phi coefficient
r2 The proportion of variation in the
dependent variable explained by the
independent variable(s)
14
15. LISTING OF OTHER SYMBOLS
SYMBOLS MEANINGS
P(A) Probability of event A
P(A/B) Probability of event A given that event
B has happened
CV Coefficient of variation
SE Standard error
O Observed frequency
X Independent (explanatory, predictor)
variable in regression
Y Dependent (outcome, response,
criterion) variable in regression
df
Degree of freedom
t
Symbol for the t ratio (the critical
ratio that follows a t distribution
R2
Squared multiple correlation in
multiple regression
15
16. FURTHER INFORMATION ON TYPE I and TYPE II Error
The Real world
The null hypothesis is really……..
True False
Finding from your
Survey
You found that True No Problem Type 2 Error
the null
hypothesis is:
False Type 1 Error No Problem
THE WHEREABOUTS OF SOME SPSS FUNCTIONS
Functions or Commands Whereabouts, in SPSS (the process in
arriving at various commands)
Mean, Analyze
Mode, Descriptive statistics
Median, Frequency
Standard deviation,
Skewness, or kurtosis, Statistics
Range
Minimum or maximum
Analyze
Chi-square Descriptive statistics
crosstabs
16
17. Analyze
Pearson’s Moment Correlation Correlate
bivariate
Analyze
Spearman’s rho Correlate
Bivariate
(ensure that you deselect Pearson’s, and
select Spearman’s rho)
Analyze
Linear Regression Regression
Linear
Analyze
Logistic Regression Regression
Binary
Analyze
Discriminant Analysis Classify
Discriminant
Analyze
Mann-Whitney U Test Nonparametric Test
2 Independent Samples
Independent –Sample t-test Analyze
Compare means
Independent Samples T-Test
Analyze
Wilcoxon matched-pars test or Nonparametric Test
2 Independent Samples
Wilcoxon signed-rank test
Analyze
t-test Compare means
Analyze
Paired-samples t-test Compare means
Paired-samples T-test
Analyze
One-sample t-test Compare means
One-samples T-test
Analyze
One-way analysis of variance Compare means
One-way ANOVA
17
18. Analyze
Factor Analysis Data reduction
Factor
Analyze
Descriptive (for a single metric Descriptive statistics
Descriptive
variable)
Graphs
Graphs (select the appropriate type)
Pie chart
Bar charts
Histogram
Graphs
Scatter plots Scatter…
Data
Weighting cases Weight cases….
Select weight cases by
Graphs
Selecting cases Select cases…
If all conditions are satisfied
Select If
Transform
Replacing missing values Missing cases values…
Box 3: The whereabouts of some SPSS Functions
18
19. Disclaimer
I am a trained Demographer, and as such, I have undertaken extensive review of
various aspects to the SPSS program. However, I would like to make this unequivocally clear
that this does not represent SPSS (Statistical Product and Service Solutions, formerly Statistical
Package for the Social Sciences) brand. Thus, this text is not sponsored or approved by SPSS,
and so any errors that are forthcoming are not the responsibility of the brand name.
Continuing, the SPSS is a registered trademark, of SPSS Inc. In the event that you need more
pertinent information on the SPSS program or other related products, this may be forwarded to:
SPSS UK Ltd., First Floor, St. Andrews House, West Street, Working GU211EB, United
Kingdom.
19
20. Coding Missing Data
The coding of data for survey research is not limited to response, as we need to code missing
data. For example, several codes indicate missing values and the researcher should know them
and the context in which they are applicable in the coding process. No answer in a survey
indicates something apart from the respondent’s refusal to answer or did not remember to
answer. The fundamental issue here is that there is no information for the respondent, as the
information is missing.
Table : Missing Data codes for Survey Research
Question Refused answer Didn’t know answer No answer recorded
Less than 6 categories 7 8 9
More than 7 and less 97 98 99
than 3 digits
More than 3 digits 997 998 999
Note
Less than 6 categories – when a question is asked of a respondent, the option (or response) may
be many. In this case, if the option to the question is 6 items or less, refusal can be 7, didn’t
know 8 or no answer 9.
Some researchers do not make a distinction between the missing categories, and 999 are used
in all cases of missing values (or 99).
20
21. Computing Date of Birth – If you are only given year of birth
Step 1
Step 1:
First, select transform, and
then compute
21
22. Step 2
On selecting
‘compute variable’ it
will provide this
dialogue box
22
23. Step 3
In the ‘target
variable’, write
the word which
the researcher
wants to use to
represents the idea
23
24. Step 4
If the SPSS program is
more than 12.0 (ie 13 –
17), the next process is
to select all in ‘function
group’ dialogue box
In order to
convert year
of birth to
actual ‘age’,
select
‘Xdate.Year’
24
25. Step 5
Replace the
‘?’ mark
with
variable in
the dataset
Having selected
XYear, use this
arrow to take it
into the ‘Numeric
Expression’
dialogue box
25
26. LISTING OF FIGURES AND TABLES
Listing of Figures
Figure 1.1.1: Flow Chart: How to Analyze Quantitative Data?
Figure 1.1.2: Properties of a Variable.
Figure 1.1.3: Illustration of Dichotomous Variables
Figure 1.1.4: Ranking of the Levels of Measurement
Figure 1.1.5: Levels of Measurement
Figure 2.1.0: Steps in Analyzing Non-Metric Data
Figure 2.1.1: Respondents’ Gender
Figure 2.1.2: Respondents’ Gender
Figure 2.1.3: Social Class of Respondents
Figure 2.1.4: Social Class of Respondents
Figure 2.1.5: Steps in Analyzing Metric Data
Figure 2.1.6: ‘Running’ SPSS for a Metric Variable
Figure 2.1.7: ‘Running’ SPSS for a Metric Variable
Figure 2.1.8: ‘Running’ SPSS for a Metric Variable
Figure 2.1.9: ‘Running’ SPSS for a Metric Variable
Figure 2.1.10: ‘Running’ SPSS for a Metric Variable
Figure 2.1.11: ‘Running’ SPSS for a Metric Variable
Figure 2.1.12: ‘Running’ SPSS for a Metric Variable
Figure 2.1.13: ‘Running’ SPSS for a Metric Variable
Figure 2.1.14: ‘Running’ SPSS for a Metric Variable
Figure 2.1.15: ‘Running’ SPSS for a Metric Variable
26
27. Figure 2.1.16: ‘Running’ SPSS for a Metric Variable
Figure 4.1.1: Age - Descriptive Statistics
Figure 4.1.2: Gender of Respondents
Figure 4.1.3: Respondent’s parent educational level
Figure 4.1.4: Parental/Guardian Composition for Respondents
Figure 4.1.5: Home Ownership of Respondent’s Parent/Guardian
Figure 4.1.6: Respondents’ Affected by Mental and/or Physical Illnesses
Figure 4.1.7: Suffering from mental illnesses
Figure 4.1.8: Affected by at least one Physical Illnesses
Figure 4.1.9: Dietary Consumption for Respondents
Figure 6.1.2: Typology of Previous School
Figure 6.1.3: Skewness of Examination i (i.e. Test i)
Figure 6.1.4: Skewness of Examination ii (i.e. Test ii)
Figure 6.1.5: Perception of Ability
Figure 6.1.6: Self-perception
Figure 6.1.7: Perception of task
Figure 6.1.8: Perception of utility
Figure 6.1.9: Class environment influence on performance
Figure 6.1.10: Perception of Ability
Figure 6.1.11: Self-perception
Figure 6.1.12: Self-perception
Figure 6.1.13: Perception of task
Figure 6.1.14: Perception of Utility
27
28. Figure 6.1.15: Class Environment influence on Performance
Figure 7.1.1: Frequency distribution of total expenditure on health as % of GDP
Figure 7.1.2: Frequency distribution of total expenditure on education as % of GNP
Figure 7.1.3: Frequency distribution of the Human Development Index
Figure 7.1.4: Running SPSS for social expenditure on social programme
Figure 7.1.5: Running bivariate correlation for social expenditure on social programme
Figure 7.1.6: Running bivariate correlation for social expenditure on social programme
Figure13.1.1: Categories that describe Respondents’ Position
Figure13.1.2: Company’s Annual Work Volume
Figure13.1.3: Company’s Labour Force – ‘on an averAge per year’
Figure13.1.4: Respondents’ main Area of Construction Work
Figure13.1.5: Percentage of work ‘self-performed’ in contrast to ‘sub-contracted’
Figure13.1.6: Percentage of work ‘self-performed’ in contrast to ‘sub-contracted’
Figure 13.1.7: Years of Experience in Construction Industry
Figure13.1.8: Geographical Area of Employment
Figure13.1.9: Duration of service with current employer
Figure13.1.10: Productivity changes over the past five years
Figure 14.1.1: Characteristic of Sampled Population
Figure 14.1.2: Employment Status of Respondents
28
29. Listing of Tables
Table 1.1.1: Synonyms for the different Levels of measurement
Table 1.1.2: Appropriateness of Graphs, from different Levels of measurement
Table 1.1.3: Levels of measurement1 with examples and other characteristics
Table1.1.4: Levels of measurement, and measure of central tendencies and measure of
variability
Table1.1.5: combinations of Levels of measurement, and types of statistical Test which are
application
Table 1.1.6a: Statistical Tests and their Levels of Measurement
Table 1.1.6b:
Table 2.1.1a: Gender of Respondents
Table 2.1.1b: General happiness
Table 2.1.2: Social Status
Table 2.1.3: Descriptive Statistics on the Age of the Respondents
Table 2.1.4:“From the following list, please choose what the most important characteristic of
democracy …are for you”
Table 4.1.1: Respondents’ Age
Table 4.1.2 (a) Univariate Analysis of the explanatory Variables
Table 4.1.2(b): Univariate Analysis of explanatory
Table 4.1.2 (c): Univariate Analysis of explanatory
Table 4.1.3: Bivariate Relationships between academic performance and subjective Social
Class (n=99)
1
29
30. Table 4.1.4: Bivariate Relationships between comparative academic performance and
subjective Social Class (n=108)
Table 4.1.5: Bivariate Relationships between academic performance and physical exercise (n=
111)
Table 4.1.6 (i): Bivariate Relationships between academic performance and instructional
materials (n=113)
Table 4.1.6 (ii) Relationship between academic performance and materials among students
who will be writing the A’ Level Accounting Examination, 2004
Table 4.1.7: Bivariate Relationships between academic performance and Class attendance (n=
106)
Table 4.1.8: Bivariate Relationship between academic performance and attendance
Table 4.1.9: Bivariate Relationships between academic performance and breakfast
consumption, (n=114)
Table 4.1.10: Relationship between academic performances and breakfasts consumption
among A’ Level Accounting students, controlling for Gender
Table 4.1.11: Bivariate Relationships between academic performance and
migraine (n=116)
Table 4.1.12: Bivariate Relationships between academic performance and mental illnesses,
(n=116)
Table 4.1.13: Bivariate Relationships between academic performance and physical illnesses,
(n=116)
Table 4.1.14: Bivariate Relationships between academic performance and illnesses (n=116)
Table 4.1.15. Bivariate Relationships between current academic performance and past
performance in CXC/GCE English language Examination, (n= 112)
Table 4.1.16: Bivariate Relationships between academic performance and past performance in
CXC/GCE English language Examination, controlling for Gender
Table 4.1.17: Bivariate Relationships between academic performance and past performance in
CXC/GCE Mathematics Examination n=
Table 4.1.18 (i): Bivariate Relationships between academic performance and past performance
in CXC/GCE principles of accounts Examination (n= 114)
30
31. Table 4.1.19 (ii): Bivariate Relationships between academic performance and past
performance in CXC/GCEPOA Examination, controlling for Gender
Table 4.1.20: Bivariate Relationships between academic performance and Self-Concept (n=
112)
Table 4.1.21: Bivariate Relationships between academic performance and Dietary
Requirements (n=116)
Table 4.1.22: Summary of Tables
Table 5.1.1: Frequency and percent Distributions of explanatory model Variables
Table 5.1.2: Relationship between Religiosity and Marijuana Smoking (n=7,869)
Table 5.1.3: Relationship between Religiosity and Marijuana Smoking controlled for Gender
Table 5.1.4: Relationship between Age and marijuana smoking (n=7,948)
Table 5.1.5: Relationship between marijuana smoking and Age of Respondents, controlled
for sex
Table 5.1.6: Relationship between academic performances and marijuana smoking,
(n=7,808)
Table 5.1.7: Relationship between academic performances and marijuana smoking,
controlled for Gender
Table 5.1.8: Summary of Tables
Table 6.1.1: Age Profile of respondent
Table 6.1.2: Examination Scores
Table 6.1.3(a): Class Distribution by Gender
Table 6.1.3(b): Class Distribution by Age Cohorts
Table 6.1.3(c): Pre-Test Score by Typology of Group
Table 6.1.3(c): Pre-Test Score by Typology of Group
Table 6.1.4: Comparison of Examination I and Examination II
Table 6.1.5: Comparison a Cross the Group by Tests
31
32. Table 6.1.6: Analysis of Factors influence on Test ii Scores
Table 6.1.7: Cross-Tabulation of Test ii Scores and Factors
Table 6.1.8: Bivariate Relationship between student’s Factors and Test ii Scores
Table 7.1.1: Descriptive Statistics - total expenditure on public health (as Percentage of GNP
HRD, 1994)
Table 7.1.2: Descriptive Statistics of expenditure on public education (as Percentage of GNP,
Hrd, 1994)
Table 7.1.3: Descriptive Statistics of Human Development (proxy for development)
Table 7.1.4: Bivariate Relationships between dependent and independent Variables
Table 7.1.5: Summary of Hypotheses Analysis
Table8.1.1: Age Profile of Respondents (n = 16,619)
Table 8.1.2: Logged Age Profile of Respondents (n = 16,619)
Table 8.1.3: Household Size (all individuals) of Respondents
Table 8.1.4: Union Status of the sampled Population (n=16,619)
Table 8.1.5: Other Univariate Variables of the Explanatory Model
Table 8.1.6: Variables in the Logistic Equation
Table 8.1.7: Classification Table
Table 8.1.1: Univariate Analyses
Table 8.1.2: Frequency Distribution of Educational Level by Quintile
Table 8.1.3: Frequency Distribution of Jamaica’s Population by Quintile and Gender
Table 8.1.4: Frequency Distribution of Educational Level by Quintile
Table 8.1.5: Frequency Distribution of Pop. Quintile by Household Size
Table 8.1.6: Bivariate Analysis of access to Tertiary Edu. and Poverty Status
Table 8.1.7: Bivariate Analysis of access to Tertiary Edu. and Geographic Locality of
Residents
32
33. Table 8.1.8: Bivariate Analysis of geographic locality of residents and poverty Status
Table 8.1.9: Bivariate Relationship between access to tertiary level education by Gender
Table 8.1.10: Bivariate Relationship between Access to Tertiary Level Education by Gender
controlled for Poverty Status
Table 8.1.11: Regression Model Summary
Table 10.1.1: Univariate Analysis of Parental Information
Table 10.1.2: Descriptive on Parental Involvement
Table 10.1.3: Univariate Analysis of Teacher’s Information
Table 10.1.4: Univariate Analysis of ECERS-R Profile
Table 10.1.5: Bivariate Analysis of Self-reported Learning Environment and Mastery on
Inventory Test
Table 10.1.6: Relationship between Educational Involvement, Psychosocial and Environment
involvement and Inventory Test
Table 10.1.6: Relationship between Educational Involvement, Psychosocial and Environment
Involvement and Inventory Test
Table 10.1.8: School Type by Inventory Readiness Score
Table 11.1.1: Incivility and Subjective Social Status
Table 12.1.2: Have you or someone in your family known of an act of Corruption in the last 12 months?
Table 12.1.3: Gender of Respondent
Table 12.1.4: In what Parish do you live?
Table 12.1.5: Suppose that you, or someone close to you, have been a victim of a crime. What would
you do...?
Table 12.1.6: What is your highest level of Education?
Table 12.1.7: In terms of Work, which of these best describes your Present situation?
Table 12.1.8: Which best represents your Present position in Jamaica Society?
Table 12.1.9: Age on your last Birthday?
Table 12.1.10: Age categorization of Respondents
33
34. Table 12.1.11: Suppose that you, or someone close to you, have been a victim of a crime. what would
you do... by Gender of respondent Cross Tabulation
Table 12.1.12: If involved in a dispute with neighbour and repeated discussions have not made a
difference, would you...? by Gender of respondent Cross Tabulation
Table 12.1.13: Do you believe that corruption is a serious problem in Jamaica? by Gender of
respondent Cross Tabulation
Table 12.1.14: have you or someone in your family known of an act of corruption in the last 12
months? by Gender of respondent Cross Tabulation
Table 14.1.1: Marital Status of Respondents
Table 14.1.2: Marital Status of Respondents by Gender
Table 14.1.3: Marital Status by Gender by Age cohort
Table 14.1.4: Marital Status by Gender by Age Cohort
Table 14.1.5 Educational Level by Gender by Age Cohorts
Table 14.1.6: Income Distribution of Respondents
Table 14.1.7: Parental Attitude Toward School
Table 14.1.8: Parent Involving Self
Table 14.1.9: School Involving Parent
Table 14.1.8: Regression Model Summary
Table 15.1.1: Correlations
Table 15.1.2: Cross Tabulation between incivility and social status
34
35. How do I obtain access to the SPSS PROGRAM?
Step One:
In order to access the SPSS program, the student should select ‘START’ to the
bottom left hand corner of the computer monitor. This is followed by selecting
‘All programs’ (see below).
Select ‘START’ and then ‘All
Program
35
36. Step Two:
The next step to the select ‘SPSS for widows’. Having chosen ‘SPSS for
widows’ to the right of that appears a dialogue box with the following options –
SPSS for widows; SPSS 12.0 (or 13.0…or, 15.0); SPSS Map Geo-dictionary
Manager Ink; and last with SPSS Manager.
Select
‘SPSS for
widows’
36
37. Step Three:
Having done step two, the student will select SPSS 12.0 (or 13.0, or 14.0 or 15.0) for
Widows as this is the program with which he/she will be working.
Select SPSS 12.0 (or 13.0,
or 14.0 or 15.0) for Widows
37
38. Step Four:
On selecting ‘SPSS for widows’ in step 3, the below dialogue box appears. The
next step is the select ‘OK’, which result in what appears in step five.
Select
‘OK’
38
43. Step Seven:
What is the difference here? Look to the bottom left-hand cover the spreadsheet
and you will see two terms – (1) ‘Data View’ and (2) ‘Variable View’. Data
View accommodates the entering of the data having established the template in
the ‘Variable View’. Thus, the variable view allows for the entering of data (i.e.
responses from the questionnaires) in the ‘Data View’. Ergo, the student must
ensure that he/she has established the template, before any typing can be done in
the ‘Data View.
widow looks like
‘Data View’
Observe what the
Data View
43
44. 44
Variable View
Observe what the
‘Variable View’
widow looks like
45. CHAPTER 1
1.1.0a: INTRODUCTION
This book is in response to an associate’s request for the provision of some material that would
adequately provide simple illustrations of ‘How to analyze quantitative data in the Social
Sciences from actual hypotheses’. He contended that all the current available textbooks,
despite providing some degree of analysis on quantitative data, failed to provide actual
illustrations of cases, in which hypotheses are given and a comprehensive assessment made to
answer issues surrounding appropriate univariate, bivariate and/or multivariate processes of
analysis. Hence, I began a quest to pursued textbooks that presently exist in ‘Research Methods
in Social Sciences’, ‘Research Methods in Political Sciences’, “Introductory Statistics’,
‘Statistical Methods’, ‘Multivariate Statistics’, and ‘Course materials on Research Methods’
which revealed that a vortex existed in this regard.
Hence, I have consulted a plethora of academic sources in order to formulate this text.
In wanting to comprehensively fulfill my friend’s request, I have used a number of dataset that
I have analyzed over the past 6 years, along with the provision of key terminologies which are
applicable to understanding the various hypotheses.
I am cognizant that a need exist to provide some information in ‘Simple Quantitative
Data Analysis’ but this text is in keeping with the demand to make available materials for
aiding the interpretation of ‘quantitative data’, and is not intended to unveil any new materials
in the discipline. The rationale behind this textbook is embedded in simple reality that many
undergraduate students are faced with the complex task of ‘how to choose the most appropriate
statistical test’ and this becomes problematic for them as the issue of wanting to complete an
45
46. assignment, and knowing that it is properly done, will plague the pupil. The answer to this
question lies in the fundamental issues of - (1) the nature of the variables (continuous or
discrete), and (2) what is the purpose of the analysis – is to mere description, or to provide
statistical inference and/or (3) if any of the independent variables are covariates2. Nevertheless,
the materials provided here are a range of research projects, which will give new information
on particular topics from the hypothesis to the univariate analysis and the bivariate or
multivariate analyses.
2
“If the effects of some independent variables are assessed after the effects of other independent variables are
statistically removed…” (Tabachnick and Fidell 2001, 17)
46
47. 1.1.0b: STEPS IN ANALYZING A HYPOTHESIS
One of the challenges faced by a social researcher is how to succinctly conceptualize (i.e.
define) his/her variables, which will also be operationalized (measured) for the purpose of the
study. Having written a hypothesis, the researcher should identify the number of variables
which are present, from which we are to identify the dependent from the independent variables.
Following this he/she should recognize the level of measurement to which each variable
belongs, then the which statistical test is appropriate based on the level of measurement
combination of the variables. The figure below is a flow chart depicting the steps in analyzing
data when given a hypothesis.
The production of this text is in response to the provision of a simple book which
would address the concerns of undergraduate students who must analyze a hypothesis. Among
the issues raise in this book are (1) the systematic steps involved in the completion of
analyzing a hypothesis, (2) definitions of a hypothesis, (3) typologies of hypothesis, (4)
conceptualization of a variable, (4) types of variables, (5) levels of measurement, (6)
illustration of how to perform SPSS operations on the description of different levels of
measurement and inferential statistics, (7) Type I and II errors, (8) arguments on the treatment
of missing variables as well as outliers, (9) how to transform selected quantitative data, (10)
and other pertinent matters.
The primary reason behind the use of many of the illustrations, conceptualizations and
peripheral issues rest squarely on the fact the reader should grasp a thorough understanding of
how the entire process is done, and the rationale for the used method.
47
48. STEP ONE
STEP TEN Write your
Having used the Hypothesis STEP TWO
test, Identify the
analyze the data variables from the
carefully, based on hypothesis
the statistical test
STEP TEN STEP THREE
Choose the Define and
appropriate operationalize
statistical test based each variable
on the combination selected from the
of DV and IVS, and hypothesis
STEP NINE STEP FOUR
ANALYZING
If statistical Inference
is needed, look at the QUANTITATIVE
Decide on the level
combination DV and DATA
of measurement
IV(s)
for each variable
STEP EIGHT STEP FIVE
If statistical
association, causality
Decide which
or predictability is
need, continue, if not variable is DV, and
stop! IV
STEP SIX
STEP SEVEN Check for
Do descriptive skewness, and/or
statistics for chosen outliers in metric
variables selected variables
FIGURE 1.1.1: FLOW CHART: HOW TO ANALYZE QUANTITATIVE DATA?
This entire text is ‘how to analyze quantitative data from hypothesis’, but based on Figure
1.1.1, it may appear that a research process begins from a hypothesis, but this is not the case.
Despite that, I am emphasizing interpreting hypothesis, which is the base for this monograph
starting from an actual hypothesis. Thus, before I provide you with operational definitions of
48
49. variables, I will provide some contextualization of ‘what is a variable?’ then the steps will be
worked out.
49
50. 1.1.1a: DEFINITIONS OF A VARIABLE
Undergraduates and first time researchers should be aware that quantitative data analysis are
primarily based on (1) empirical literature, (2) typologies of variables within the hypothesis,
(3) conceptualization and operationalization of the variables, (4) the level of measurement for
each variables. It should be noted that defining a variable is simply not just the collation a
group of words together, because we feel a mind to as each variable requires two critical
characteristics in order that it is done properly (see Figure 1.1.2).
PROPERITIES OF A VARIABLE
MUTUAL EXCLUSIVITIY EXHAUSTIVNESS
FIGURE 1.1.2: PROPERTIES OF A VARIABLE.
In order to provide a comprehensive outlook of a variable, I will use the definitions of a
various scholars so as to give a clear understanding of what it is.
“Variables are empirical indicators of the concepts we are researching. Variables, as their
name implies, have the ability to take on two or more values...The categories of each variable
must have two requirements. They should be both exhaustive and mutually exclusive. By
exhaustive, we mean that the categories of each variable must be comprehensive enough that it
is possible to categorize every observation” (Babbie, Halley, and Zaino 2003, 11).
“.. Exclusive refers to the fact that every observation should fit into only one category
“(Babbie, Halley and Zaino 2003, 12)
“A variable is therefore something which can change and can be measured.” (Boxill, Chambers
and Wint 1997, 22)
50
51. “The definition of a variable, then, is any attribute or characteristic of people, places, or events
that takes on different values.” (Furlong, Lovelace, Lovelace 2000, 42)
“A variable is a characteristic or property of an individual population unit” (McClave, Benson
and Sincich 2001, 5)
“Variable. A concept or its empirical measure that can take on multiple values” (Neuman
2003, 547).
“Variables are, therefore, the quantification of events, people, and places in order to measure
observations which are categorical (i.e. nominal and ordinal data) and non-categorical (i.e.
metric) in an attempt to be informed about the observation in reality. Each variable must fill
two basic conditions – (i) Exhaustiveness – the variable must be so defined that all tenets are
captured as its is comprehensive enough include all the observations, and (ii) mutually
exclusivity – the variable should be so defined that it applies to one event and one event only –
(i.e. Every observation should fit into only one category) (Bourne 2007).
One of the difficulties of social research is not the identification of a variable or
variables in the study but it’s the conceptualization and oftentimes the operationalization of
chosen construct. Thus, whereas the conceptualization (i.e. the definition) of the variable may
(or may not) be complex, it is the ‘how do you measure such a concept (i.e. variable) which
oftentimes possesses the problem for researchers. Why this must be done properly bearing in
mind the attributes of a variable, it is this operational definition, which you will be testing in
the study (see Typologies of Variables, below). Thus, the testing of hypothesis is embedded
within variables and empiricism from which is used to guide present studies. Hypothesis
testing is a technique that is frequently employed by demographers, statisticians, economists,
psychologists, to name new practitioners, who are concerned about the testing of theories, and
the verification of reality truths, and the modifications of social realities within particular time,
space and settings. With this being said, researchers must ensure that a variable is properly
defined in an effort to ensure that the stated phenomenon is so defined and measured.
51
52. 1.1.1b TYPOLOGIES of VARIABLE (examples, using Figure 1.1.2, above)
Health care seeking behaviour: is defined as people visiting a health practitioner or health
consultant such as doctor, nurse, pharmacist or healer for care and/ or advice.
Levels of education: This is denominated into the number of years of formal schooling that
one has completed.
Union status – It is a social arrangement between or among individuals. This arrangement
may include ‘conjugal’ or a social state for an individual.
Gender: A sociological state of being male or female.
Per capita income: This is used a proxy for income of the individual by analyzing the
consumption pattern.
Ownership of Health insurance: Individuals who possess of an insurance polic/y (ies).
Injuries: A state of being physically hurt. The examples here are incidences of disability,
impairments, chronic or acute cuts and bruises.
Illness: A state of unwellness.
Age: The number of years lived up to the last birthday.
Household size - The numbers of individuals, who share at least one common meal, use
common sanitary convenience and live within the same dwelling.
Now that the premise has been formed, in regard to the definition of a variable, the next
step in the process is the category in which all the variables belong. Thus, the researcher needs
to know the level of measurement for each variable - nominal; ordinal; interval, or ration (see
1.1.2a).
52
53. 1.1.2a: LEVELS OF MEASUREMENT3: Examples and definitions
Nominal - The naming of events, peoples, institutions, and places, which are coded numerical
by the researcher because the variable has no normal numerical attributes. This
variable may be either (i) dichotomous, or (ii) non-dichotomous.
Dichotomous variable – The categorization of a variable, which has only two sub-
groupings - for example, gender – male and female; capital punishment –
permissive and restrictive; religious involvement – involved and not involved.
Non-dichotomous variable – The naming of events which span more than two
sub-categories (example Counties in Jamaica – Cornwall, Middlesex and Surrey;
Party Identification – Democrat, Independent, Republican; Ethnicity – Caucasian,
Blacks, Chinese, Indians; Departments in the Faculty of Social Sciences –
Management Studies, Economics, Sociology, Psychology and Social Work,
Government; Political Parties in Jamaica – Peoples’ National Party (PNP),
Jamaica Labour Party (JLP), and the National Democratic Movement (NDM);
Universities in Jamaica – University of the West Indies; University of
Technology, Jamaica; Northern Caribbean University; University College of the
Caribbean; et cetera)
Ordinal - Rank-categorical variables: Variables which name categories, which by their very
nature indicates a position, or arrange the attributes in some rank ordering (The
examples here are as follows i) Level of Educational Institutions –
Primary/Preparatory, All-Age, Secondary/High, Tertiary; ii) Attitude toward gun
control – strongly oppose, oppose, favour, strongly favour; iii) Social status –
upper--upper, upper-middle, middle-middle, lower-middle, lower class; iv)
Academic achievement – A, B, C, D, F.
Interval
or ratio These variables share all the characteristics of a nominal and an ordinal variable
along with an equal distance between each category and a ‘true’ zero value – (for
example – age; weight; height; temperature; fertility; votes in an election,
mortality; population; population growth; migration rates, .
Now that the definitions and illustrations have been provided for the levels of measurement,
the student should understand the position of these measures (see 1.1.2b).
3
Stanley S. Stevens is created for the development of the typologies of scales – level of measurement – (i)
nominal, (ii) ordinal, (iii) interval and (iv) ratio. (see Steven 1946, 1948, 1968; Downie and Heath 1970)
53
54. Dichotomy
(or
Dichotomous
variable
Typologies of
Gender Science
Book
Non-
Fictional Male Female Pure Applied
Fictional
Alive Dead Induction Deduction
Non-
Parametric
Burial Non-burial parametric
statistics
statistics
Religious Non-religious Non- use primary use secondary
Decomposed data data
service service decomposed
Figure 1.1.3: Illustration of dichotomous variables
54
55. 1.1.2b: RANKING LEVELS OF MEASUREMENT
RATIO
highes
t
INTERVAL
ORDINAL
lowest
NOMINAL
Figure 1.1.4: Ranking of the levels of measurement
The very nature of levels of measurement allows for (or do not allow for) data manipulation. If
the level of measurement is nominal (for example fiction and non-fiction books), then the
researcher does not have a choice in the reconstruction of this variable to a level which is
below it. If the level of measurement, however, is ordinal (for example no formal education,
primary, secondary and tertiary), then one may decide to use a lower level of measure (for
example below secondary and above secondary). The same is possible with an interval
variable. The social scientist may want to use one level down, ordinal, or two levels down,
nominal. This is equally the same of a ratio variable. Thus, the further ones go up the
pyramid, the more scope exists in data transformation.
55
56. Table 1.1.1: Synonyms for the different Levels of measurement
Levels of Measurement Other terms
Nominal Categorical; qualitative, discrete4
Ordinal Qualitative, discrete; rank-ordered; categorical
Interval/Ratio Numerical, continuous5, quantitative; scale; metric, cardinal
Table 1.1.2: Appropriateness of Graphs for different levels of measurement
Levels of Measurement Graphs
Bar chart Pie chart Histogram Line Graph
Nominal √ √ __ __
√ √ __ __
Ordinal
__ __ √ √
Interval/Ratio (or metric)
4
Discrete variable – take on a finite and usually small number of values, and there is no smooth transition from
one value or category to the next – gender, social class, types of community, undergraduate courses
5
Continuous variables are measured on a scale that changes values smoothly rather than in steps
56
57. Table 1.1.3: Levels of measurement6 with Examples and Other Characteristics
Levels of Measurement
Nominal Ordinal Interval Ratio
Examples Gender Social class Temperature Age
Religion Preference Shoe size Height
Political Parties Level of education Life span Weight
Race/Ethnicity Gender equity Reaction time
Political Ideologies levels of fatigue Income; Score on an Exam.
Noise level Fertility; Population of a country
Job satisfaction Population growth; crime rates
Mathematical properties Identity Identity Identity Identity
____ Magnitude
Magnitude Magnitude
____ _____ Equal Interval Equal interval
____ _____ _____ True zero
Mathematical
Operation(s) None Ranking Addition; Addition;
Subtraction Subtraction;
Division;
Multiplication
Compiled: Paul A. Bourne, 2007; a modification of Furlong, Lovelace and Lovelace 2000, 74
6
“Levels of measurement concern the essential nature of a variable, and it is important to know this because it determines what one can do with a variable
(Burham, Gilland, Grant and Layton-Henry 2004, 114)
57
58. Table1.1.4: Levels of measurement, Measure of Central Tendency and Measure of Variability
Levels of Measurement Measure of central tendencies Measure of variability
Mean Mode Median Mean deviation Standard deviation
Nominal NA √ NA NA NA
Ordinal NA √ √ NA NA
Interval/Ratio7 √ √ √ √ √
NA denotes Not Applicable
7
Ratio variable is the highest level of measurement, with nominal being first (i.e. lowest); ordinal, second; and interval, third.
58
59. Table1.1.5: Combinations of Levels of measurement, and types of Statistical test which are applicable8
Levels of Measurement Statistical Test
Dependent Independent Variable
Nominal Nominal Chi-square
Nominal Ordinal Chi-square; Mann-Whitney
Nominal Interval/ratio Binomial distribution; ANOVA;
Logistic Regression; Kruskal-Wallis
Discriminant Analysis
Ordinal Nominal Chi-square
Ordinal Ordinal Chi-square; Spearman rho;
Ordinal Interval/ratio Kruskal-Wallis H; ANOVA
Interval/ratio Nominal ANOVA;
Interval/ratio Ordinal
Interval/ratio Interval/ratio Pearson r, Multiple Regression
Independent-sample t test
Table 1.1.5 depicts how a dependent variable, which for example is nominal, which when combined with an independent variable,
Nominal, uses a particular statistical test.
8
One of the fundamental issues within analyzing quantitative data is not merely to combine then interpret data, but it is to use each variable appropriately. This
is further explained below.
59
60.
61. STATISTICAL TESTS AND THEIR LEVELS OF MEASUREMENT
Test Independent Dependent
Variable variable
Chi-Square (χ2) Nominal, Ordinal Nominal, Ordinal
Mann-Whitney U Dichotomous Nominal, Ordinal
test
Kruskal-Wallis H Non-dichotomous, Ordinal, or skewed9
test Ordinal Metric
Pearson’s r Normally distributed10 Normally distributed
Metric Metric
Linear Regress Normally distributed Normally distributed
Metric, dummy Metric
Independent Dichotomous Normally distributed
Samples Metric
T-test
AVONA Nominal, Ordinal Normally distributed
(non-dichotomous11) Metric
Logistic regression Metric, dummy Dichotomous (skewed
values or otherwise
Discriminant Metric, dummy Dichotomous (normally distributed
analysis value)
Notes to Table 1.1.6b
Chi-Square (χ2) Used to test for associations between two variables
Mann-Whitney U test Used to determine differences between two groups
Kruskal-Wallis H test Used to determine differences between three or more groups
Pearson’s r Used to determine strength and direction of a relationship
between two values
Linear Regression Used to determine strength and direction of a relationship
between two or more values
Independent Samples
T-test Used to determine difference between two groups
AVONA Used to determine difference between three or more groups
Logistic regression Used to predict relationship between many values
Discriminant analysis Used to predict relationship between many values
9
Skewness indicates that there is a ‘pileup’ of cases to the left or right tail of the distribution
10
Normality is observed, whenever, the values of skewness and kurtosis are zero
11
Non-dichotomous (i.e. polytomous) which denotes having many (i.e. several) categories
61
62. LEVELS OF MEASURMENT AND THEIR MEASURING
ASSOCIATION
LEVELS OF
MEASUREMENT
NOMINAL ORDINAL INTERVAL/RATIO
Lambda Gamma Pearson’s r
Cramer’s V Somer’s D
Contingency coefficients Kendall ‘s tau-B
Phi Kendall’s tau-c
Figure 1.1.5: Levels of measurement
ג
Lambda ( ) – This is a measure of statistical relationship between the uses of two nominal
variables
Phi (Φ) – This is a measure of association between the use of two dichotomous
variables (i.e. dichotomous dependent and dichotomous independent) – [Φ
= √[ χ2/N]
Cramer’s V (V) – This is a measure of association between the use of two nominal
variables (i.e. in the event that there is dichotomous dependent and
dichotomous independent) – V = √[ χ2/N(k – 1)] is identical to phi.
γ
Gamma ( ) – This is used to measure the statistical association between ordinal by
ordinal variable
Contingency coefficient (cc) – Is used for association in which the matrix is more than 2
X 2 (i.e. 2 for dependent and 2 for the independent – for example 2X3; 3X2;
3X3 …) - √ [χ2/ χ2 + N]
Pearson’s r – This is used for non-skewed metric variables - n∑xy - ∑x.∑y
√ [n∑x2 – (∑x) 2 - [n∑y2 – (∑y) 2
62
63. 1.1.3: CONCEPTUALIZING DESCRIPTIVE AND INFERENTIAL
STATISTICS
Research is not done in isolation from the reality of the wider society. Thus, the social
researcher needs to understand whether his/her study is descriptive and/or inferential as it
guides the selection of certain statistical tools. Furthermore, an understanding of two
constructs dictate the extent to which the analyst will employ as there is a clear
demarcation between descriptive and inferential statistics. In order to grasp this
distinction, I will provide a number of authors’ perspectives on each terminology.
“Descriptive statistics describe samples of subjects in terms of variables or combination
of variables” (Tabachnick and Fidell 2001, 7)
“Numerical descriptive measures are commonly used to convey a mental image of
pictures, objects, tables and other phenomenon. The two most common numerical
descriptive measures are: measures of central tendencies and measures of variability
(McDaniel 1999, 29; see also Watson, Billingsley, Croft and Huntsberger 1993, 71)
“Techniques such as graphs, charts, frequency distributions, and averages may be used
for description and these have much practical use” (Yamane 2973, 2; see also Blaikie
2003, 29; Crawshaw and Chambers 1994, Chapter 1)
“Descriptive statistics – statistics which help in organizing and describing data, including
showing relationships between variables” (Boxill, Chamber and Wind 1997, 149)
63
64. “We’ll see that there are two areas of statistics: descriptive statistics, which focuses on
developing graphical and numeral summaries that describes some…phenomenon, and
inferential statistics, which uses these numeral summaries to assist in making…
decisions” (McClave, Benson, Sinchich 2001, 1)
“Descriptive statistics utilizes numerical and graphical methods to look for patterns in a
data set, to summarize the information revealed in a data set, and to present the
information in a convenient form” (McClave, Benson and Sincich 2001, 2)
“Inferential statistics utilizes sample data to make estimates, decisions, predictions, or
other generalizations about a larger set of data” (McClave, Benson and Sincich 2001, 2)
“The phrase statistical inference will appear often in this book. By this we mean, we
want to “infer” or learn something about the real world by analyzing a sample of data.
The ways in which statistical inference are carried out include: estimating…parameters;
predicting…outcomes, and testing…hypothesis …” (Hill, Griffiths and Judge 2001, 9).
Inferential statistics is not only about ‘causal’ relationships; King, Keohane and
Verba argue that it is categorized into two broad areas: (1) descriptive, and (2) causal
inference. Thus, descriptive inference speaks to the description of a population from
what is made possible, the sample size. According to Burham, Gilland, Grant and
Layton-Henry (2004) state that:
Causal inferences differ from descriptive ones in one very significant way: they
take a ‘leap’ not only in terms of description, but in terms of some specific causal
64
65. process [i.e. predictability of the variables]” (Burham, Gilland, Grand and Layton-
Henry 2004, 148).
In order that this textbook can be helping and simple, I will provide operational
definitions of concepts as well as illustration of particular terminologies along with
appropriateness of statistical techniques based on the typologies of variable and the level
of measurement (see in Tables 1.1.1 – 1.1.6, below).
65
66. CHAPTER 2
2.1.0: DESCRIPTIVE STATISTICS
The interpretation of quantitative data commences with an overview (i.e. background
information on survey or study – this is normally demographic information) of the
general dataset in an attempt to provide a contextual setting of the research (descriptive
statistics, see above), upon which any association may be established (inferential
statistics, see above). Hence, this chapter provides the reader with the analysis of
univariate data (descriptive statistics), with appropriate illustration of how various levels
of measurement may be interpreted, and/or diagrams chosen based on their suitability.
A variable may be non-metric (i.e. nominal or ordinal) or metric (i.e. scale,
interval/ratio). It is based on this premise that particular descriptive statistics are provide.
In keeping with this background, I will begin this process with non-metric, then metric
data. The first part of this chapter will provide a thorough outline of how nominal and/or
ordinal variables are analyzed. Then, the second aspect will analyze metric variables.
66
67. STEP ONE
Ensure that the
STEP TEN variable is non-
Analyze the output metric (e.g. Gender, STEP TWO
(use Table 2.1.1a) general happiness)
Select Analyze
STEP TEN STEP THREE
Select descriptive
select paste or ok statistics
HOW TO DO
DESCRIPTIVE
STEP NINE STATISTICS FOR A STEP FOUR
NO-METRIC
Choose bar or pie graphs VARIABLE? select frequency
STEP FIVE
STEP EIGHT
select the non-metric
select Chart
variable
STEP SEVEN STEP SIX
select mode or mode and
median (based on if the select statistics at the
variable is nominal or end
ordinal respective
Figure 2.1.0: Steps in Analyzing Non-metric data
67
68. 2.1.1a: INTERPRETING NON-METRIC (or Categorical) DATA
NOMINAL VARIABLE (when there are not missing cases)
Table 2.1.1a: Gender of respondents
Frequency Percent Valid
Percent
Male 150 69.4 69.4
Gender:
Female 66 30.6 30.6
Total 216 100.0 100.0
Identifying Non-missing Cases: When there are no differences between the percent
column and those of the valid percent column, then there are no missing cases.
How is the table analyzed? Of the sampled population (n=21612), 69.4% were males
compared to 30.6% females.
12
The total number of persons interviewed for the study. It is advisable that valid percents are used in
descriptive statistics as there may be some instances then missing cases are present with the dataset, which
makes the percent figure different from those of the valid percent (Table 2.1.1b).
68
69. NOMINAL VARIABLE: Establishment of when missing cases
Table 2.1.1b: General Happiness
Frequency Percent Valid
Percent
Very happy 467 30.8 31.1
General
Happiness:
Pretty happy 872 57.5 58.0
Not too happy 165 10.9 11.0
Missing Cases 13 0.9 -
Total 1,517 100.0 100.0
Identifying Missing Cases: In seeking to ascertain missing data (which indicates that
some of the respondents did no answer the specified question), there is a disparity
between the values for percent and those in valid percent. In this case, 13 of 1,517
respondents did not answer question on ‘general happiness’. In cases where there is a
difference between the two aforementioned categories (i.e. percent and valid percent), the
student should remember to use the valid percent. The rationale behind the use of the
valid percent is simple, the research is about those persons who have answered and they
are captured in the valid percent column. Hence, it is recommended that the student use
the valid percent column at all time in analyzing quantitative data.
Interpretation: Of the sampled population (n=1,517), the response rate is 99.1%
(n=1,504)13. Of the valid responses (n=1,504), 31.1% (n=467) indicated that they were
‘very happy’, with 58.0% (n=872) reported being ‘pretty happy’, compared to 11.0%
(n=165) who said ‘not too happy’.
13
Because missing cases are within the dataset (13 or 0.9%), there is a difference between percent and valid
percent. Thus, care should be taken when analyzing data. This is overcome when the valid percents are
used.
69
70. Owing to the typology of the variable (i.e. nominal), this may be presented graphical by
either a pie graph or a bar graph.
Pie graph
Female,
30.6, 31%
Male, 69.4,
69%
Figure 2.1.1: Respondents’ gender
OR
Bar graph
70
60
50
40
30
20
10
0
Male Female
Figure 2.1.2: Respondents’ gender
70
71. ORDINAL VARIABLE
Table 2.1.2: Subjective (or self-reported) Social Class
Frequency Percent Valid Percent
Social class:
Lower 100 46.3 46.3
Middle 104 48.1 48.1
Upper 12 5.6 50.6
Total 216 100.0 100.0
Interpreting the Data in Table 2.1.2:
When the respondents were asked to select what best describe their social standing, of the
sampled population (n=216), 46.3% reported lower (working) class, 48.1% revealed
middle class compared to 5.6% who said upper middle class. Based on the typology of
variable (i.e. ordinal), the graphical options are (i) pie graph and/or (2) bar graph.
Note: In cases where there is no difference between the percent column and that of valid
percent, researchers infrequently use both columns. The column which is normally used
is valid percent as this provides the information of those persons who have actually
responded to the specified question. Instead of using ‘valid percent’ the choice term is
‘percent’.
71
72. 50
45
48.1
40 46.3
35
30
25
20
15
10
5 5.6
0
Lower class Middle class Upper middle
class
Figure 2.1.3: Social class of respondents
Or
Upper
middle
class, 5.6 Lower
class, 46.3
Middle
class, 48.1
Figure 2.1.4: Social class of respondents
72
73. 2.1.1b: STEPS IN INTERPRETING METRIC VARIABLE:
METRIC (i.e. scale or interval/ratio)
STEP ONE
STEP TEN Know the metric
variable (Age) STEP TWO
Analyze the output
(use Table 2.1.3)
Select Analyze
STEP TEN STEP THREE
Select descriptive
select paste or ok statistics
HOW TO DO
STEP NINE DESCRIPTIVE
STATISTICS FOR STEP FOUR
Choose histogram A METRIC
with normal curve VARIABLE? select frequency
STEP FIVE
STEP EIGHT
select Chart
select the metric
variable
STEP SIX
STEP SEVEN
select mean, select statistics at
standard deviation,
the end
skewness
Figure 2.1.5: Steps in Analyzing Metric data
73