2. Analysing Quantitative Data
• Data collected on its own does not answer your research
question(s)
• The data needs to be interpreted – to do this we need to
organise it and analyse it!
• Most students panic at this stage – statistics work!!!!
• Have you heard of terms in your reading of “multiple
regression”, “t-test”, “ANOVA” etc.
3. Statistics
• The more exercise classes attended by an individual the fitter they get.
• There is a positive and statistically significant relationship between
student attendance and academic achievement
• There is a positive and statistically significant relationship between 5km
and 10km PB’s
• There is a statistically significant difference in mortality rates and the
ownership of red cars compared to other car colours
4. • Statistics can be split into 2 forms:
1. Descriptives – organise the data...describe it!
2. Inferentials – allow you to make inferences...what does it mean?
• You need to ask yourself:
1. What exactly do I need to find out from this data to answer my
research question?
2. What statistical test will give me this info?
3. What do the results from this test mean?
• Statistics themselves have no meaning! The importance lies in
how you interpret them!!!
5. Computer Software
• Most common to use software to conduct the tests
• Most common is SPSS
(statistical package for the social sciences).
• Globally used in health and social sciences!
• Knowledge of SPSS is a very valuable transferable skill to
have….C.V!
6. Statistical Inference
• Work of statistician = making predictions about a group based on
collected data from a small sample of that population
• (exploring differences, relationships and making statements about the
meaningfulness)
• Stats allow us to make a statement and then cite the odds that it is
correct!
• Computers make stats easier: organise, analyse, and display data
much faster than we can.
• BUT, the PC is only an extension of you. It will only perform if you
enter the data correctly and understand its output.
• Before a PC is useful to you, you must know what you want it to do
and what is expected. That is where this module comes in!
7. Descriptive statistics
• Important to know the full range of information for different
variables
• PB, seasons best, distance, age, BMI etc
• We need to know the mean, median & mode to describe
the data!
• 1st type: central tendency (M, M, M)
• If these are known we can interpret the value of a single
score (e.g., 2nd place) by comparing it to the mean, median
and mode!
8. • Suppose we collect data on the monthly wage of
employees from two specific companies.
Company one Company two
1000 1890
1200 2135
1215 786
1300 980
990 1200
875 768
1345 1000
1000 1890
1200 2142
1215 1390
850 9800
970 1200
1875 3256
1345 1000
Such RAW
DATA on its
own is not
particularly
informative
We usually
need to
SUMMARISE/
DESCRIBE
our data.
Central Tendency
9. • Mean (m)
– Average of scores of a particular set of scores
• Median
– Central value (mid-point)
– E.g., if weekly hrs spent training for a sport were 2, 2, 4,
5, 6, 10, 10, 11, 15 the median = 6
– If you have two groups
(e.g., males & females, high v low fear of failure) you can
calculate a median split – to make a comparison!
• Mode
– Most frequent number
(e.g., most common age for people who drop out from
sport)
10. Name Club Annual Salary
Robin van Persie Arsenal £4.5 million
Darren Bent Aston Villa £3.5 million
Nicolas Anelka Chelsea £3 million
Didier Drogba Chelsea £3.5 million
Mario Balotelli Manchester City £4 million
Michael Owen Manchester United £2.5 million
Carlos Tevez Manchester City £13 million
Tuncay Sanli Stoke City £3 million
Darren Bent Sunderland £2.25 million
Jermain Defoe Tottenham £3 million
£42.25m Total: Mean = £4.225m
• What has happened is that an outlier (extreme value) has
distorted the information carried by the mean.
However, Averages can be distorted by outliers!
11. The Median
• In some instances the MEDIAN maybe a better measure of
central tendency
• The median is basically the central value of a data set when
that set is numerically ordered
• Sometimes the median is very simple to find...
£100,000 £125,000 £200,000 £225,000 £300,000
12. The Median
• At times, the median can be slightly more difficult to calculate
due to the fact that there may not be a single middle value
• In such instances we take the MEAN of the TWO MIDDLE
VALUES to be the MEDIAN
£100,000 £150000 £200,000 £250,000 £300,000 £500000
£225,000
13. The Mode
• The mode is the most frequently occurring value in a
given data set:
• 7, 4, 8, 8, 9, 2, 4, 5, 7, 8, 4, 8, 8, 6
Here the mode would be 8
• But what about this data set?
• 3, 4, 5, 6, 4, 5, 4, 5, 6, 9, 1, 2
Such data sets obviously have two modes. These are
usually referred to as bi-modal (4 & 5).
14. Numerical
• They convey info about the degree of your measure….
Graphical = e.g., box plot
• Contains detailed info about the distribution of scores
• Usual to use both in your study!
Information can be presented as...
15. Measures of Dispersion/Variance
• Central tendency alone does not always provide an adequate
summary of our data
– the dispersion or variability of scores within a data set give
us supplementary information about the data
• We often need an idea of how each of our data values vary
around the central measure
• For example, we might know the mean of a data set, but people
might vary quite dramatically around that central value
• Suppose the manager, asked for a comparison of the wages of
his four most featured strikers and his four most featured
midfielders...
16. Strikers £/week Midfielders £/week
A £130,000 1 £175,000
B £250,000 2 £160,000
C £125,000 3 £150,000
D £125,000 4 £150,000
MEAN = £157,500 MEAN = £158,750
If we present the manager with the means alone, we do not
give him the full story…
• Clearly the variability in the first data set far outweighs that of the
second.
17. Variability of Data
• A statistic that allows the spread (dispersion) of the data
to be appreciated is the range.
• The range is simply the difference between the smallest
and largest values in the data set.
18. Range
RANGE = £125,000 RANGE = £25,000
Strikers £/week Midfielders £/week
A £130,000 1 £175,000
B £250,000 2 £160,000
C £125,000 3 £150,000
D £125,000 4 £150,000
MEAN = £157,500 MEAN = £158,750
19. BUT…..The range alone does not tell us the full story of how
much variability there is on average around the mean
Standard deviation does.
• It is a measure of the extent to which scores deviate from the
mean
• You will very frequently see it mentioned in research papers:
Descriptive statistics suggested that males (M = 4.4, SD = 0.8) had higher
levels of confidence than females (M = 3.6, SD = 0.5)
• If SD is large, then the Mean may not be a good
representation
20. • Say two samples have identical means:
• BUT....They can have different standard deviations (Spread
of scores around the mean)
• This tells the researcher that the measures from the sample
with the larger standard deviation are likely to deviate
further from the mean score to a greater extent
• i.e., the scores are more spread out.
21. Presenting Descriptive Statistics
• Generally presented in tables and graphs
• Tables should be included where the information is appropriate
to the research question
• Include notation to show significance
23. Coding Data: to find descriptives of groups
• SPSS only deals with numbers and NOT words!!!
• Sometimes (quite often in sport) we need to CODE our data
• Coding = translating responses into common categories
each with an assigned numerical value to allow you to run
some statistics
It is very easy!
• If you get non-numerical data (e.g., gender, level of
participation, sport played etc) you need to give each group
a code number (e.g., 1 for male 2 for female).
24. For example...
• All males are coded 1 and females 0
• All football players are coded 0, rugby players 1 and hockey
players 2 etc.
• The computer then knows what is 0 and what is 1
• Then when you run your descriptives, SPSS will be able to
give you them for each group and not just the sample as a
whole. Therefore it allows you to compare...
• Then when you run the inferential statistics you can
actually really compare the results!
25.
26. De scriptives
26.5625 1.32906
23.7297
29.3953
26.5139
24.0000
28.263
5.31625
20.00
34.00
14.00
11.25
.544 .564
-1.370 1.091
56.5238 .86084
54.7281
58.3195
56.8598
58.0000
15.562
3.94486
47.00
60.00
13.00
6.00
-1.416 .501
1.445 .972
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Grouping variable
exercise dependent but
good body image/no MD
Exdependent and MD
Social Pysique Anxiety
Statistic Std. Error
27. Inferential Statistics
• Used to draw inferences (logical conclusion) about a
population from a sample
• E.G. We want to explore the effects of sleep deprivation on
performance.
• 10 subjects who performed a task post 24hrs of sleep
deprivation scored 12pts less than 10 subjects who performed
task after ‘normal’ sleep.
• Is the difference real or due to chance?
• Significant differences tests :- t-test, ANOVA etc
• Tests of association:- correlation
29. 2 Types of Inferential Tests
• Inferential tests test a null hypothesis (i.e., there will be
no relationship or difference between two variables).
1. Parametric tests – used on data that meet a strict
criteria
2. Non-Parametric tests - used on data that do not meet
the strict criteria
• We will be exploring these criteria next week!
30. Summary
• Statistics used to describe data (descriptive stats)
• Also used to discern what data mean (inferential)
• The type of test used determined by experimental
design
• First step in data analysis is exploring the data
• What is the effect of one variable on another