SlideShare uma empresa Scribd logo
1 de 76
DESCRIPTIVE
STATISTICS
BADR EDDINE
IBN YAHIA
OUTLINE
• Introduction
• Frequency Distribution
• Measures of Central Tendency
• Measures of Variability
• Describing Interval and Ratio Data (Numerical
Scores)
• Describing Non-numerical Data from Nominal
and Ordinal Scales of Measurement
• Using Graphs to Summarize Data
• Correlations
• Regression
• Multiple Regression
• The general goal of descriptive statistics is to organize or summarize a set of
scores. Two general techniques are used to accomplish this goal.
• 1. Organize the entire set of scores into a table or a graph that allows researchers
(and others) to see the whole set of scores.
• 2. Compute one or two summary values (such as the average) that describe the
entire group.
FREQUENCY DISTRIBUTION
• A frequency distribution is an overview of all distinct values in some variable
and the number of times they occur.
• It consists of a tabulation of the number of individuals in each category on the
scale of measurement:
• 1. The set of categories that make up the scale of measurement.
• 2. The number of individuals with scores in each of the categories.
FREQUENCY DISTRIBUTION
• The advantage of a frequency distribution is that it allows a researcher to view the
entire set of scores. It presents raw data in an organized, easy-to-read format.
• The disadvantage is that constructing a frequency distribution without the aid of
a computer can be somewhat tedious, especially with large sets of data. The
primary drawback of frequency distributions is the loss of detail.
TABLE 15.1 IS A FREQUENCY DISTRIBUTION TABLE SUMMARIZING
THE SCORES FROM A 5-POINT QUIZ GIVEN TO A CLASS OF N 15
STUDENTS.
• In this example, one person had a
perfect score of X=5 on the quiz,
three people had scores of X=4.
• Another example, 183 students fill
out a questionnaire. One of the
questions was which study major
they're following.
• The resulting table shows how
frequencies are distributed over
values -study majors in this example-
and hence is a frequency distribution.
RELATIVE FREQUENCIES
• Optionally, a frequency distribution
may contain relative frequencies:
frequencies relative to (divided by)
the total number of values. Relative
frequencies are often shown as
percentages or proportions.
• Relative frequencies provide easy
insight into frequency distributions.
Besides, they facilitate comparisons.
FREQUENCY DISTRIBUTION GRAPHS
• The graph shows the scale of measurement (set of categories) along the
horizontal axis and the frequencies on the vertical axis.
• When the measurement scale (scores) consists of numerical values (interval or
ratio scale of measurement), there are two options for graphing the frequency
distribution.
• A histogram is a graph that illustrates the relative frequency of a single variable.
• A polygon is a graph constructed by using lines to join the midpoints of each
interval, or bin.
FREQUENCY DISTRIBUTION GRAPHS
• Figure 15.1a is a traditional histogram
with a bar above each category.
Traditional histogram (a).
FREQUENCY DISTRIBUTION GRAPHS
• In Figure 15.1b, they modified the
histogram slightly by changing each
bar into a stack of blocks.
• The modification helps emphasize
the concept of a frequency
distribution.
A modified histogram
(b)
FREQUENCY DISTRIBUTION GRAPHS
• Figure 15.1c presents the same data
in a polygon.
A polygon (c)
FREQUENCY DISTRIBUTION GRAPHS
• It shows how frequencies are
distributed over values.
FREQUENCY DISTRIBUTION GRAPHS
• When the categories on the scale of
measurement are nominal or ordinal
scales, the frequency distribution is
presented as a bar graph.
• Also, it is easy to see the extreme
scores that are very different from
the rest of the group.
Bar Graph Showing the Frequency Distribution
of Academic Majors in an Introductory Psychology Class.
• Frequency distributions, especially graphs, can be a very effective method for
presenting information about a set of scores.
• The distribution shows whether the scores are clustered together or spread out
across the scale.
• However, a frequency distribution is generally considered to be a preliminary
method of statistical analysis.
MEASURES OF CENTRAL TENDENCY
• A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data.
• As such, measures of central tendency are sometimes called measures of central
location. They are also classed as summary statistics.
• The goal is to find the average, or the most typical, score for the entire set.
THE MEAN, MEDIAN AND MODE
• The mean, median and mode are all valid measures of central tendency, but
under different conditions, some measures of central tendency become more
appropriate to use than others.
MEAN (ARITHMETIC)
• The mean (or average) can be used with both discrete and continuous data,
although its use is most often with continuous data.
• The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set. (The mean is computed by adding the scores
and dividing the sum by the number of individuals).
• So, if we have n values in a data set and they have values x1,x2, …,xn, the sample
mean, usually denoted by x― (pronounced "x bar") or with the letter M, is:
• x̄= x1,x2, …,xn/n or M= ΣX/n
• To compute the mean, you first find the sum of the scores (represented by ΣX)
and then divide by the number of scores (represented by n).
• Scores: 4, 2, 1, 5, 2, 2, 3, 4, 3, 2, 3, 1
• ΣX=32 and n=12.
• The mean is M=32/12=2.67.
• In statistics, samples and populations have very different meanings and these
differences are very important, even if, in the case of the mean, they are
calculated in the same way.
• To acknowledge that we are calculating the population mean and not the sample
mean, we use the Greek lower case letter "mu", denoted as μ:
• μ= ΣX/n
WHEN NOT TO USE THE MEAN
• The mean has one main disadvantage: it is particularly susceptible to the
influence of outliers. These are values that are unusual compared to the rest of
the data set by being especially small or large in numerical value.
• For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
• The mean salary for these ten staff is $30.7k.
• The mean is being skewed by the two large salaries. Therefore, in this situation,
we would like to have a better measure of central tendency.
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
• Mean cannot be calculated for nominal or nonnominal ordinal data (when we are
dealing with qualitative characteristics).
• For example, a researcher may use the value 0 for a male and the value 1 for a
female (nominal measurements are coded with numerical values).
• In this situation, it is possible to compute a mean; however, the result is a
meaningless number.
MEDIAN
• The median is the middle score for a set of data that has been arranged in order
of magnitude.
• The median is the score that divides a distribution in half.
• In order to calculate the median:
• 65 55 89 56 35 14 56 55 87 45 92
• We first need to rearrange that data into order of magnitude (smallest first):
• 14 35 45 55 55 56 56 65 87 89 92
• Our median mark is the middle mark - in this case, 56. This works fine when you
have an odd number of scores.
• Take the middle two scores and average the result. So, if we look at the example
below:
• 65 55 89 56 35 14 56 55 87 45
• We again rearrange that data into order of magnitude (smallest first):
• 14 35 45 55 55 56 56 65 87 89
• Only now we have to take the 5th and 6th score in our data set and average them
to get a median of 55.5.
• The median is 55+56/2=55.5
• In a distribution with a few extreme scores, for example, the extreme values can
displace the mean so that it is not a central value.
• In this situation, the median often provides a better measure of central tendency.
Thus, you can think of the median as a backup measure of central tendency that
is used in situations in which the mean does not work well.
THE MODE
• The mode is the score or category with the greatest frequency.
• The mode is simply the most frequently occurring score.
• Scores: 4, 2, 1, 5, 2, 2, 3, 4, 3, 2, 3, 1
• There are more scores of X=2 than any other value. The mode is 2.
• On a histogram it represents the highest bar in a bar chart or histogram. You can,
therefore, sometimes consider the mode as being the most popular option.
• The mode identifies the location of
the peak (highest point) in the
distribution.
• Normally, the mode is used for
categorical data where we wish to
know which is the most common
category.
TYPES OF MODE
• Example:
• For a data set (3, 7, 3, 9, 9, 3, 5, 1, 8,
5), the unique mode is 3.
• A distribution with a single mode is
said to be unimodal.
• Example:
• Similarly, for a data set (2, 4, 9, 6, 4, 6, 6, 2,
8, 2), there are two modes: 2 and 6.
• A distribution with more than one mode is
said to be bimodal, trimodal, etc., or in
general, multimodal.
• However, one of the problems with
the mode is that it will not provide us
with a very good measure of central
tendency when the most common
mark is far away from the rest of the
data in the data set.
• The mean is a measure of central tendency obtained by adding the individual
scores, then dividing the sum by the number of scores. The mean is the arithmetic
average.
• The median measures central tendency by identifying the score that divides the
distribution in half. If the scores are listed in order, 50% of the individuals have
scores at or below the median.
• The mode measures central tendency by identifying the most frequently
occurring score in the distribution.
SUMMARY OF WHEN TO USE THE MEAN, MEDIAN
AND MODE
Type of Variable
Best measure of central
tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not
skewed)
Mean
Interval/Ratio (skewed) Median
• Goals scored over the last 7 games.
• 1 3 4 6 6 7 8
• Mean (average) 5
• Mode (most common) 6
• Median (middle) 6
MEASURES OF VARIABILITY
• Variability describes the spread of the scores in a distribution.
• When variability is small, it means that the scores are all clustered close together.
• Large variability means that there are big differences between individuals and the
scores are spread across a wide range of values.
• To introduce the idea of variability, consider this example. Two vending machines A and B drop
candies when a quarter is inserted. The number of pieces of candy one gets is random. The
following data are recorded for six trials at each vending machine:
• Vending Machine A Pieces of candy from vending machine A:
• 1, 2, 3, 3, 5, 4
• mean = 3, median = 3, mode = 3
• Vending Machine B Pieces of candy from vending machine B:
• 2, 3, 3, 3, 3, 4
• mean = 3, median = 3, mode = 3
• The dot plot for the pieces of candy
from vending machine A and vending
machine B is displayed:
• There are many ways to describe variability or spread including:
• Range
• Interquartile range (IQR)
• Variance and Standard Deviation
RANGE
• The range is the difference in the maximum and minimum values of a data set.
The maximum is the largest value in the dataset and the minimum is the smallest
value. The range is easy to calculate but it is very much affected by extreme
values.
• Range=Maximum-Minimum
• Goals scored over the last 7 games.
• 1 3 4 6 6 7 8
• Range (largest-smallest) 7
INTERQUARTILE RANGE (IQR)
• Like the range, the IQR is a measure of
variability, but you must find the
quartiles in order to compute its value.
• The interquartile range is the difference
between upper and lower quartiles and
denoted as IQR.
• IQR = Q3 – Q1
• = 75th percentile – 25th
percentile
VARIANCE AND STANDARD DEVIATION
• Standard deviation uses the mean of the
distribution as a reference point and
measures variability by measuring the
distance between each score and the mean.
Conceptually, standard deviation measures
the average distance from the mean.
• When the scores are clustered close to the
mean, the standard deviation is small; when
the scores are scattered widely around the
mean, the standard deviation is large.
• The calculation of standard deviation begins by computing the average squared
distance from the mean. This average squared value is called variance.
• Variance is the average squared distance from the mean and is usually identified
with the symbol s². The calculation of variance involves two steps:
STEP 1:
• Compute the distance from the mean, or the deviation, for each score, then
square each distance, then add the squared distances. The result is called SS, or
the sum of the squared deviations.
• SS= ΣX²-(ΣX)²/n
• X = 5 6 1 5 3 = 20, ΣX=20
• X²= 25 36 1 25 9 = 96, ΣX²=96
• SS (The sum of the squared deviations) =16
STEP 2:
• Variance is obtained by dividing SS (the sum of squared deviations) by n-1.
• SS=16 and n=5
• Variance=s²=SS/n-1=16/4=4
• When we calculate the sample SD we estimate the population mean with the
sample mean, and dividing by (n-1) rather than n which gives it a special property
that we call an "unbiased estimator".
• Therefore s² is an unbiased estimator for the population variance.
STANDARD DEVIATION (SD)
• Approximately the average distance the values of a data set are from the mean or
the square root of the variance.
• SD = √s, SD = √4 = 2
• Standard deviation = √ Variance
• Variance = (Standard deviation)²
• Standard deviation provides a measure of the standard distance from the mean. A
small value for standard deviation indicates that the individual scores are
clustered close to the mean and a large value indicates that the scores are spread
out relatively far from the mean.
• Variance also provides a measure of distance. A small variance indicates that the
scores are clustered close together; a large variance means that the scores are
widely scattered.
DESCRIBING INTERVAL AND RATIO DATA
(NUMERICAL SCORES)
• Figure shows a frequency distribution
graph with the mean and standard
deviation displayed as described.
• As a general rule, roughly 68% of the
scores in a distribution are within one
standard deviation of the mean and
roughly 95% of the scores are within
two standard deviations.
• The mean (M) and standard deviation
are two values that are probably the
most commonly reported descriptive
statistics, and they should provide
enough information to construct a
good picture of the entire set of
scores.
M=45
SD=6
DESCRIBING NON-NUMERICAL DATA FROM NOMINAL
AND ORDINAL SCALES OF MEASUREMENT
• A researcher may simply classify participants by placing them in separate nominal
or ordinal categories.
• Classification of people by gender (male or female).
• Classification of attitude (agree or disagree).
• Classification of self-esteem (high, medium, or low).
• Report the proportion or percentage in each category.
• These values can be used to describe a single sample or to compare separate
samples.
• For example, a report might describe a sample of voters by stating that 43%
prefer candidate Green, 28% prefer candidate Brown, and 29% are undecided.
• A research report might compare two groups by stating that 80% of the 6-year-
old children were able to successfully complete the task, but only 34% of the 4-
year-olds were successful.
• In addition to percentages and proportions, you also can use the mode as a
measure of central tendency for data from a nominal scale.
• For example, if the modal response to a survey question is “no opinion,” you can
probably conclude that the people surveyed do not care much about the issue.
USING GRAPHS TO SUMMARIZE DATA
• For example, a researcher may want
to examine the effects of heat and
humidity on performance.
• For this study, both the temperature
(variable 1) and the humidity
(variable 2) would be manipulated,
and performance would be evaluated
under a variety of different
temperature and humidity
conditions.
• As a general rule, graphs for two-
factor studies are constructed by
listing the values of one of the
independent variables on the
horizontal axis and listing the values
for the dependent variable on the
vertical axis.
• Notice that the top line presents the
means in the top row of the data
matrix and the bottom line shows the
means from the bottom row.
• The result is a graph that displays all
six means from the experiment, and
allows comparison of means and
mean differences.
CORRELATIONS
• A correlation is a statistical value that measures and describes the direction and
degree of relationship between two variables.
• The sample correlation coefficient is typically denoted as r. It is also known as
Pearson’s r.
• r = SP/ √(SS for X)(SS for Y)
• Note that the two variables are labeled X and Y.
• SP is The sum or the products of the deviations.
• For this example, the researcher
computes a correlation that measures
and describes the relationship
between self-esteem and
performance.
Participant Self-Esteem
Scores
Performance
Scores
A 62 13
B 84 20
C 89 22
D 73 16
E 66 11
F 75 18
G 71 14
H 80 21
Two Separate Scores for Each Participant
Participant Self-Esteem
Scores
Performance
Scores
A 62 13
B 84 20
C 89 22
D 73 16
E 66 11
F 75 18
G 71 14
H 80 21
Two Separate Scores for Each Participant
A Scatter Plot Showing the Data
CALCULATION
• x̄ =62+84+89+73+66+75+71+80 =75
• 8
• ȳ =13+20+22+16+11+18+14+21 =16.875
• 8
• Σ(x - x̄)2 = (62-75)2+(84-75)2+(89-75)2+(73-75)2+(66-75)2+(75-75)2+(71-75)2+(80-
75)2 = 572
Σ(y - ȳ)2 = (13-16.88)2+(20-16.88)2+(22-16.88)2+(16-16.88)2+(11-16.88)2+(18-
16.88)2+(14-16.88)2+(21-16.88)2 = 112.875
Σ(x - x̄)(y - ȳ) = (62-75)*(13-16.88)+(84-75)*(20-16.88)+(89-75)*(22-16.88)+(73-
75)*(16-16.88)+(66-75)*(11-16.88)+(75-75)*(18-16.88)+(71-75)*(14-16.88)+(80-
75)*(21-16.88) = 237
• Sxy = Σ(x - x̄)(y - ȳ)
• n – 1
• r= 237 = 0.9327
• √(572*112.875)
• Results of the Pearson correlation indicated that there is a significant large
positive relationship between X self-esteem and Y performance, (r= .933, p <
.001). r = 0.9327.
• The P-value is the probability that you would have found the current result if
the correlation coefficient were in fact zero (null hypothesis). If this
lower than the conventional 5% (P<0.05) the correlation coefficient is called
statistically significant.
PROPERTIES OF THE CORRELATION COEFFICIENT, R
• +1 ≥ r ≥ -1, i.e. r takes values between -1 and +1, inclusive.
• The sign of the correlation provides the direction of the linear relationship. The sign
indicates whether the two variables are positively or negatively related.
• A correlation of 1.00 indicates a perfectly consistent relationship and a correlation of
0.00 indicates no consistent relationship whatsoever.
• There are no units attached to r.
• As the magnitude of r approaches 1, the stronger the linear relationship.
• As the magnitude of r approaches 0, the weaker the linear relationship.
• The correlation value would be the same regardless of which variable we defined as X
and Y
• The following four graphs illustrate
four possible situations for the values
of r.
• The graph (d) which shows a strong
relationship between y and x but
where r = 0. Note that no linear
relationship does not imply no
relationship exists!
SPEARMAN CORRELATION
• The Spearman correlation also referred to as Spearman rank correlation or
Spearman’s “rho”.
• It is typically denoted either with the Greek letter rho (ρ), or rs is simply the
Pearson correlation applied to ordinal data (ranks). If the original scores are
numerical values from an interval or ratio scale, it is possible to rank the scores
and then compute a Spearman correlation.
• rs = SP/ √(SS for X)(SS for Y)
• In this case, the Spearman correlation measures the degree to which the
relationship is consistently one-directional, or monotonic.
REGRESSION
• Linear regression attempts to model the relationship between two variables by
fitting a linear equation to observed data. One variable is considered to be an
explanatory variable, and the other is considered to be a dependent variable.
• A linear regression line has an equation of the form 𝑌 = 𝑎 + 𝑏𝑥, where X is the
explanatory variable and Y is the dependent variable. The slope of the line is b,
and a is the intercept (the value of y when x = 0).
• 𝑏 = 𝑟
𝑆𝑦
𝑆𝑥
or 𝑏 =
𝑆𝑃
𝑆𝑆𝑥
and 𝑎 = 𝑀𝑦 − 𝑏𝑀𝑥
• r is the Pearson correlation, 𝑆𝑥 is the standard deviation for the X scores, and 𝑆𝑦
is the standard deviation for the Y scores.
• The figure shows a scatter plot of X
and Y values with a straight line
drawn through the center of the data
points.
• The straight line is valuable because
it makes the relationship easier to see
and it can be used for prediction.
• First determine whether or not there is a relationship between the variables of
interest. This does not necessarily imply that one variable causes the other (for
example, higher SAT scores do not cause higher college grades), but that there is
some significant association between the two variables.
• A scatterplot can be a helpful tool in determining the strength of the relationship
between two variables. If there appears to be no association between the
proposed explanatory and dependent variables (i.e., the scatterplot does not
indicate any increasing or decreasing trends), then fitting a linear regression
model to the data probably will not provide a useful model.
MULTIPLE REGRESSION
• Multiple linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the
outcome of a response variable.
• The goal of multiple linear regression is to model the linear relationship between
the explanatory (independent) variables and response (dependent) variables.
• The formula for a multiple linear
regression is:
• Independent variables x1, x2, and so
on.
• The number of independent variables
can grow till n.
• 𝑦 = 𝑏1x1 + 𝑏2𝑥2 + ⋯ 𝑏𝑛𝑥𝑛 + 𝑎
• 𝑦 = 𝑏1𝑥1 + 𝑏2𝑥2 + 𝑎
• Example: A researcher decides to
study students’ performance from a
school over a period of time. He
observed that as the lectures proceed
to operate online, the performance of
students started to decline as well.
• The parameters for the dependent
variable “decrease in performance” are
various independent variables like
of attention, more internet addiction,
neglecting studies” and much more.
• The multiple regression equation
would be:
• Y = b1 * attention + b2 * internet
addiction + b3 * technology support
+ … BnXn + a
• Multiple regression helps us to better study the various predictor variables at
hand.
• It increases reliability by avoiding dependency on just one variable and have more
than one independent variable to support the event.
• Multiple regression analysis permits you to study more formulated hypotheses
that are possible.
REFERENCE
• Gravetter, F. J., & Forzano, L. B. (2011). Research Methods for the Behavioral
Sciences, 4th Edition. In Descriptive Statistics (4th ed., pp. 434–451). Wadsworth
Publishing.

Mais conteúdo relacionado

Mais procurados

Statistics for Medical students
Statistics for Medical studentsStatistics for Medical students
Statistics for Medical studentsANUSWARUM
 
Variables & Studytype
Variables & StudytypeVariables & Studytype
Variables & StudytypeAman Ullah
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisDr Rajeev Kumar
 
Concept of Inferential statistics
Concept of Inferential statisticsConcept of Inferential statistics
Concept of Inferential statisticsSarfraz Ahmad
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAmira Talic
 
Analysis of data in research
Analysis of data in researchAnalysis of data in research
Analysis of data in researchAbhijeet Birari
 
graphic representations in statistics
 graphic representations in statistics graphic representations in statistics
graphic representations in statisticsUnsa Shakir
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsStatistics Consultation
 
Measures of Variation or Dispersion
Measures of Variation or Dispersion Measures of Variation or Dispersion
Measures of Variation or Dispersion Dr Athar Khan
 
HYPOTHESIS TESTING
HYPOTHESIS TESTINGHYPOTHESIS TESTING
HYPOTHESIS TESTINGAmna Sheikh
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Type I and Type II Errors in Research Methodology
Type I and Type II Errors in Research MethodologyType I and Type II Errors in Research Methodology
Type I and Type II Errors in Research MethodologyDr. Chinchu C
 
Analysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAnalysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAASTHAJAJOO
 
Measure of Dispersion
Measure of DispersionMeasure of Dispersion
Measure of Dispersionsonia gupta
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2safi Ullah
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statisticsewhite00
 

Mais procurados (20)

Statistics for Medical students
Statistics for Medical studentsStatistics for Medical students
Statistics for Medical students
 
Variables & Studytype
Variables & StudytypeVariables & Studytype
Variables & Studytype
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 
Scaling Techniques
Scaling TechniquesScaling Techniques
Scaling Techniques
 
Concept of Inferential statistics
Concept of Inferential statisticsConcept of Inferential statistics
Concept of Inferential statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Range
RangeRange
Range
 
Analysis of data in research
Analysis of data in researchAnalysis of data in research
Analysis of data in research
 
graphic representations in statistics
 graphic representations in statistics graphic representations in statistics
graphic representations in statistics
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statistics
 
Measures of Variation or Dispersion
Measures of Variation or Dispersion Measures of Variation or Dispersion
Measures of Variation or Dispersion
 
HYPOTHESIS TESTING
HYPOTHESIS TESTINGHYPOTHESIS TESTING
HYPOTHESIS TESTING
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Type I and Type II Errors in Research Methodology
Type I and Type II Errors in Research MethodologyType I and Type II Errors in Research Methodology
Type I and Type II Errors in Research Methodology
 
Analysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptxAnalysis-of-data-with-missing-values.pptx
Analysis-of-data-with-missing-values.pptx
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Measure of Dispersion
Measure of DispersionMeasure of Dispersion
Measure of Dispersion
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 

Semelhante a Descriptive Statistics.pptx

3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptxjeyanthisivakumar
 
Business statistics
Business statisticsBusiness statistics
Business statisticsRavi Prakash
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy Pranav Krishna
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxshakirRahman10
 
descriptive data analysis
 descriptive data analysis descriptive data analysis
descriptive data analysisgnanasarita1
 
measures of central tendency in statistics which is essential for business ma...
measures of central tendency in statistics which is essential for business ma...measures of central tendency in statistics which is essential for business ma...
measures of central tendency in statistics which is essential for business ma...SoujanyaLk1
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)Chhom Karath
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxzerihunnana
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyNilanjan Bhaumik
 
UNIT III -Central Tendency.ppt
UNIT III -Central Tendency.pptUNIT III -Central Tendency.ppt
UNIT III -Central Tendency.pptssuser620c82
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data DescriptionFarhan Alfin
 

Semelhante a Descriptive Statistics.pptx (20)

3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
Business statistics
Business statisticsBusiness statistics
Business statistics
 
Central tendency
 Central tendency Central tendency
Central tendency
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptx
 
descriptive data analysis
 descriptive data analysis descriptive data analysis
descriptive data analysis
 
chapter3.ppt
chapter3.pptchapter3.ppt
chapter3.ppt
 
measures of central tendency in statistics which is essential for business ma...
measures of central tendency in statistics which is essential for business ma...measures of central tendency in statistics which is essential for business ma...
measures of central tendency in statistics which is essential for business ma...
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptx
 
R training4
R training4R training4
R training4
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
UNIT III -Central Tendency.ppt
UNIT III -Central Tendency.pptUNIT III -Central Tendency.ppt
UNIT III -Central Tendency.ppt
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data Description
 

Último

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Último (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Descriptive Statistics.pptx

  • 2. OUTLINE • Introduction • Frequency Distribution • Measures of Central Tendency • Measures of Variability • Describing Interval and Ratio Data (Numerical Scores) • Describing Non-numerical Data from Nominal and Ordinal Scales of Measurement • Using Graphs to Summarize Data • Correlations • Regression • Multiple Regression
  • 3. • The general goal of descriptive statistics is to organize or summarize a set of scores. Two general techniques are used to accomplish this goal. • 1. Organize the entire set of scores into a table or a graph that allows researchers (and others) to see the whole set of scores. • 2. Compute one or two summary values (such as the average) that describe the entire group.
  • 4. FREQUENCY DISTRIBUTION • A frequency distribution is an overview of all distinct values in some variable and the number of times they occur. • It consists of a tabulation of the number of individuals in each category on the scale of measurement: • 1. The set of categories that make up the scale of measurement. • 2. The number of individuals with scores in each of the categories.
  • 5. FREQUENCY DISTRIBUTION • The advantage of a frequency distribution is that it allows a researcher to view the entire set of scores. It presents raw data in an organized, easy-to-read format. • The disadvantage is that constructing a frequency distribution without the aid of a computer can be somewhat tedious, especially with large sets of data. The primary drawback of frequency distributions is the loss of detail.
  • 6. TABLE 15.1 IS A FREQUENCY DISTRIBUTION TABLE SUMMARIZING THE SCORES FROM A 5-POINT QUIZ GIVEN TO A CLASS OF N 15 STUDENTS. • In this example, one person had a perfect score of X=5 on the quiz, three people had scores of X=4.
  • 7. • Another example, 183 students fill out a questionnaire. One of the questions was which study major they're following.
  • 8. • The resulting table shows how frequencies are distributed over values -study majors in this example- and hence is a frequency distribution.
  • 9. RELATIVE FREQUENCIES • Optionally, a frequency distribution may contain relative frequencies: frequencies relative to (divided by) the total number of values. Relative frequencies are often shown as percentages or proportions. • Relative frequencies provide easy insight into frequency distributions. Besides, they facilitate comparisons.
  • 10. FREQUENCY DISTRIBUTION GRAPHS • The graph shows the scale of measurement (set of categories) along the horizontal axis and the frequencies on the vertical axis. • When the measurement scale (scores) consists of numerical values (interval or ratio scale of measurement), there are two options for graphing the frequency distribution.
  • 11.
  • 12. • A histogram is a graph that illustrates the relative frequency of a single variable. • A polygon is a graph constructed by using lines to join the midpoints of each interval, or bin.
  • 13. FREQUENCY DISTRIBUTION GRAPHS • Figure 15.1a is a traditional histogram with a bar above each category. Traditional histogram (a).
  • 14. FREQUENCY DISTRIBUTION GRAPHS • In Figure 15.1b, they modified the histogram slightly by changing each bar into a stack of blocks. • The modification helps emphasize the concept of a frequency distribution. A modified histogram (b)
  • 15. FREQUENCY DISTRIBUTION GRAPHS • Figure 15.1c presents the same data in a polygon. A polygon (c)
  • 16. FREQUENCY DISTRIBUTION GRAPHS • It shows how frequencies are distributed over values.
  • 17. FREQUENCY DISTRIBUTION GRAPHS • When the categories on the scale of measurement are nominal or ordinal scales, the frequency distribution is presented as a bar graph. • Also, it is easy to see the extreme scores that are very different from the rest of the group. Bar Graph Showing the Frequency Distribution of Academic Majors in an Introductory Psychology Class.
  • 18. • Frequency distributions, especially graphs, can be a very effective method for presenting information about a set of scores. • The distribution shows whether the scores are clustered together or spread out across the scale. • However, a frequency distribution is generally considered to be a preliminary method of statistical analysis.
  • 19. MEASURES OF CENTRAL TENDENCY • A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. • As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. • The goal is to find the average, or the most typical, score for the entire set.
  • 20. THE MEAN, MEDIAN AND MODE • The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.
  • 21. MEAN (ARITHMETIC) • The mean (or average) can be used with both discrete and continuous data, although its use is most often with continuous data. • The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. (The mean is computed by adding the scores and dividing the sum by the number of individuals).
  • 22. • So, if we have n values in a data set and they have values x1,x2, …,xn, the sample mean, usually denoted by x― (pronounced "x bar") or with the letter M, is: • x̄= x1,x2, …,xn/n or M= ΣX/n • To compute the mean, you first find the sum of the scores (represented by ΣX) and then divide by the number of scores (represented by n). • Scores: 4, 2, 1, 5, 2, 2, 3, 4, 3, 2, 3, 1 • ΣX=32 and n=12. • The mean is M=32/12=2.67.
  • 23. • In statistics, samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated in the same way. • To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu", denoted as μ: • μ= ΣX/n
  • 24. WHEN NOT TO USE THE MEAN • The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. • For example, consider the wages of staff at a factory below: Staff 1 2 3 4 5 6 7 8 9 10 Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
  • 25. • The mean salary for these ten staff is $30.7k. • The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency. Staff 1 2 3 4 5 6 7 8 9 10 Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
  • 26. • Mean cannot be calculated for nominal or nonnominal ordinal data (when we are dealing with qualitative characteristics). • For example, a researcher may use the value 0 for a male and the value 1 for a female (nominal measurements are coded with numerical values). • In this situation, it is possible to compute a mean; however, the result is a meaningless number.
  • 27. MEDIAN • The median is the middle score for a set of data that has been arranged in order of magnitude. • The median is the score that divides a distribution in half.
  • 28. • In order to calculate the median: • 65 55 89 56 35 14 56 55 87 45 92 • We first need to rearrange that data into order of magnitude (smallest first): • 14 35 45 55 55 56 56 65 87 89 92 • Our median mark is the middle mark - in this case, 56. This works fine when you have an odd number of scores.
  • 29. • Take the middle two scores and average the result. So, if we look at the example below: • 65 55 89 56 35 14 56 55 87 45 • We again rearrange that data into order of magnitude (smallest first): • 14 35 45 55 55 56 56 65 87 89 • Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5. • The median is 55+56/2=55.5
  • 30. • In a distribution with a few extreme scores, for example, the extreme values can displace the mean so that it is not a central value. • In this situation, the median often provides a better measure of central tendency. Thus, you can think of the median as a backup measure of central tendency that is used in situations in which the mean does not work well.
  • 31. THE MODE • The mode is the score or category with the greatest frequency. • The mode is simply the most frequently occurring score. • Scores: 4, 2, 1, 5, 2, 2, 3, 4, 3, 2, 3, 1 • There are more scores of X=2 than any other value. The mode is 2. • On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
  • 32. • The mode identifies the location of the peak (highest point) in the distribution.
  • 33. • Normally, the mode is used for categorical data where we wish to know which is the most common category.
  • 34. TYPES OF MODE • Example: • For a data set (3, 7, 3, 9, 9, 3, 5, 1, 8, 5), the unique mode is 3. • A distribution with a single mode is said to be unimodal.
  • 35. • Example: • Similarly, for a data set (2, 4, 9, 6, 4, 6, 6, 2, 8, 2), there are two modes: 2 and 6. • A distribution with more than one mode is said to be bimodal, trimodal, etc., or in general, multimodal.
  • 36. • However, one of the problems with the mode is that it will not provide us with a very good measure of central tendency when the most common mark is far away from the rest of the data in the data set.
  • 37. • The mean is a measure of central tendency obtained by adding the individual scores, then dividing the sum by the number of scores. The mean is the arithmetic average. • The median measures central tendency by identifying the score that divides the distribution in half. If the scores are listed in order, 50% of the individuals have scores at or below the median. • The mode measures central tendency by identifying the most frequently occurring score in the distribution.
  • 38. SUMMARY OF WHEN TO USE THE MEAN, MEDIAN AND MODE Type of Variable Best measure of central tendency Nominal Mode Ordinal Median Interval/Ratio (not skewed) Mean Interval/Ratio (skewed) Median • Goals scored over the last 7 games. • 1 3 4 6 6 7 8 • Mean (average) 5 • Mode (most common) 6 • Median (middle) 6
  • 39. MEASURES OF VARIABILITY • Variability describes the spread of the scores in a distribution. • When variability is small, it means that the scores are all clustered close together. • Large variability means that there are big differences between individuals and the scores are spread across a wide range of values.
  • 40. • To introduce the idea of variability, consider this example. Two vending machines A and B drop candies when a quarter is inserted. The number of pieces of candy one gets is random. The following data are recorded for six trials at each vending machine: • Vending Machine A Pieces of candy from vending machine A: • 1, 2, 3, 3, 5, 4 • mean = 3, median = 3, mode = 3 • Vending Machine B Pieces of candy from vending machine B: • 2, 3, 3, 3, 3, 4 • mean = 3, median = 3, mode = 3
  • 41. • The dot plot for the pieces of candy from vending machine A and vending machine B is displayed:
  • 42. • There are many ways to describe variability or spread including: • Range • Interquartile range (IQR) • Variance and Standard Deviation
  • 43. RANGE • The range is the difference in the maximum and minimum values of a data set. The maximum is the largest value in the dataset and the minimum is the smallest value. The range is easy to calculate but it is very much affected by extreme values. • Range=Maximum-Minimum • Goals scored over the last 7 games. • 1 3 4 6 6 7 8 • Range (largest-smallest) 7
  • 44. INTERQUARTILE RANGE (IQR) • Like the range, the IQR is a measure of variability, but you must find the quartiles in order to compute its value. • The interquartile range is the difference between upper and lower quartiles and denoted as IQR. • IQR = Q3 – Q1 • = 75th percentile – 25th percentile
  • 45. VARIANCE AND STANDARD DEVIATION • Standard deviation uses the mean of the distribution as a reference point and measures variability by measuring the distance between each score and the mean. Conceptually, standard deviation measures the average distance from the mean. • When the scores are clustered close to the mean, the standard deviation is small; when the scores are scattered widely around the mean, the standard deviation is large.
  • 46. • The calculation of standard deviation begins by computing the average squared distance from the mean. This average squared value is called variance. • Variance is the average squared distance from the mean and is usually identified with the symbol s². The calculation of variance involves two steps:
  • 47. STEP 1: • Compute the distance from the mean, or the deviation, for each score, then square each distance, then add the squared distances. The result is called SS, or the sum of the squared deviations. • SS= ΣX²-(ΣX)²/n • X = 5 6 1 5 3 = 20, ΣX=20 • X²= 25 36 1 25 9 = 96, ΣX²=96 • SS (The sum of the squared deviations) =16
  • 48. STEP 2: • Variance is obtained by dividing SS (the sum of squared deviations) by n-1. • SS=16 and n=5 • Variance=s²=SS/n-1=16/4=4 • When we calculate the sample SD we estimate the population mean with the sample mean, and dividing by (n-1) rather than n which gives it a special property that we call an "unbiased estimator". • Therefore s² is an unbiased estimator for the population variance.
  • 49. STANDARD DEVIATION (SD) • Approximately the average distance the values of a data set are from the mean or the square root of the variance. • SD = √s, SD = √4 = 2 • Standard deviation = √ Variance • Variance = (Standard deviation)²
  • 50. • Standard deviation provides a measure of the standard distance from the mean. A small value for standard deviation indicates that the individual scores are clustered close to the mean and a large value indicates that the scores are spread out relatively far from the mean. • Variance also provides a measure of distance. A small variance indicates that the scores are clustered close together; a large variance means that the scores are widely scattered.
  • 51. DESCRIBING INTERVAL AND RATIO DATA (NUMERICAL SCORES) • Figure shows a frequency distribution graph with the mean and standard deviation displayed as described. • As a general rule, roughly 68% of the scores in a distribution are within one standard deviation of the mean and roughly 95% of the scores are within two standard deviations.
  • 52. • The mean (M) and standard deviation are two values that are probably the most commonly reported descriptive statistics, and they should provide enough information to construct a good picture of the entire set of scores. M=45 SD=6
  • 53. DESCRIBING NON-NUMERICAL DATA FROM NOMINAL AND ORDINAL SCALES OF MEASUREMENT • A researcher may simply classify participants by placing them in separate nominal or ordinal categories. • Classification of people by gender (male or female). • Classification of attitude (agree or disagree). • Classification of self-esteem (high, medium, or low).
  • 54. • Report the proportion or percentage in each category. • These values can be used to describe a single sample or to compare separate samples. • For example, a report might describe a sample of voters by stating that 43% prefer candidate Green, 28% prefer candidate Brown, and 29% are undecided. • A research report might compare two groups by stating that 80% of the 6-year- old children were able to successfully complete the task, but only 34% of the 4- year-olds were successful.
  • 55. • In addition to percentages and proportions, you also can use the mode as a measure of central tendency for data from a nominal scale. • For example, if the modal response to a survey question is “no opinion,” you can probably conclude that the people surveyed do not care much about the issue.
  • 56. USING GRAPHS TO SUMMARIZE DATA • For example, a researcher may want to examine the effects of heat and humidity on performance. • For this study, both the temperature (variable 1) and the humidity (variable 2) would be manipulated, and performance would be evaluated under a variety of different temperature and humidity conditions.
  • 57.
  • 58. • As a general rule, graphs for two- factor studies are constructed by listing the values of one of the independent variables on the horizontal axis and listing the values for the dependent variable on the vertical axis.
  • 59. • Notice that the top line presents the means in the top row of the data matrix and the bottom line shows the means from the bottom row. • The result is a graph that displays all six means from the experiment, and allows comparison of means and mean differences.
  • 60. CORRELATIONS • A correlation is a statistical value that measures and describes the direction and degree of relationship between two variables. • The sample correlation coefficient is typically denoted as r. It is also known as Pearson’s r. • r = SP/ √(SS for X)(SS for Y) • Note that the two variables are labeled X and Y. • SP is The sum or the products of the deviations.
  • 61. • For this example, the researcher computes a correlation that measures and describes the relationship between self-esteem and performance. Participant Self-Esteem Scores Performance Scores A 62 13 B 84 20 C 89 22 D 73 16 E 66 11 F 75 18 G 71 14 H 80 21 Two Separate Scores for Each Participant
  • 62. Participant Self-Esteem Scores Performance Scores A 62 13 B 84 20 C 89 22 D 73 16 E 66 11 F 75 18 G 71 14 H 80 21 Two Separate Scores for Each Participant A Scatter Plot Showing the Data
  • 63. CALCULATION • x̄ =62+84+89+73+66+75+71+80 =75 • 8 • ȳ =13+20+22+16+11+18+14+21 =16.875 • 8 • Σ(x - x̄)2 = (62-75)2+(84-75)2+(89-75)2+(73-75)2+(66-75)2+(75-75)2+(71-75)2+(80- 75)2 = 572 Σ(y - ȳ)2 = (13-16.88)2+(20-16.88)2+(22-16.88)2+(16-16.88)2+(11-16.88)2+(18- 16.88)2+(14-16.88)2+(21-16.88)2 = 112.875 Σ(x - x̄)(y - ȳ) = (62-75)*(13-16.88)+(84-75)*(20-16.88)+(89-75)*(22-16.88)+(73- 75)*(16-16.88)+(66-75)*(11-16.88)+(75-75)*(18-16.88)+(71-75)*(14-16.88)+(80- 75)*(21-16.88) = 237 • Sxy = Σ(x - x̄)(y - ȳ) • n – 1 • r= 237 = 0.9327 • √(572*112.875)
  • 64. • Results of the Pearson correlation indicated that there is a significant large positive relationship between X self-esteem and Y performance, (r= .933, p < .001). r = 0.9327. • The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this lower than the conventional 5% (P<0.05) the correlation coefficient is called statistically significant.
  • 65. PROPERTIES OF THE CORRELATION COEFFICIENT, R • +1 ≥ r ≥ -1, i.e. r takes values between -1 and +1, inclusive. • The sign of the correlation provides the direction of the linear relationship. The sign indicates whether the two variables are positively or negatively related. • A correlation of 1.00 indicates a perfectly consistent relationship and a correlation of 0.00 indicates no consistent relationship whatsoever. • There are no units attached to r. • As the magnitude of r approaches 1, the stronger the linear relationship. • As the magnitude of r approaches 0, the weaker the linear relationship. • The correlation value would be the same regardless of which variable we defined as X and Y
  • 66. • The following four graphs illustrate four possible situations for the values of r.
  • 67. • The graph (d) which shows a strong relationship between y and x but where r = 0. Note that no linear relationship does not imply no relationship exists!
  • 68. SPEARMAN CORRELATION • The Spearman correlation also referred to as Spearman rank correlation or Spearman’s “rho”. • It is typically denoted either with the Greek letter rho (ρ), or rs is simply the Pearson correlation applied to ordinal data (ranks). If the original scores are numerical values from an interval or ratio scale, it is possible to rank the scores and then compute a Spearman correlation. • rs = SP/ √(SS for X)(SS for Y) • In this case, the Spearman correlation measures the degree to which the relationship is consistently one-directional, or monotonic.
  • 69. REGRESSION • Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. • A linear regression line has an equation of the form 𝑌 = 𝑎 + 𝑏𝑥, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). • 𝑏 = 𝑟 𝑆𝑦 𝑆𝑥 or 𝑏 = 𝑆𝑃 𝑆𝑆𝑥 and 𝑎 = 𝑀𝑦 − 𝑏𝑀𝑥 • r is the Pearson correlation, 𝑆𝑥 is the standard deviation for the X scores, and 𝑆𝑦 is the standard deviation for the Y scores.
  • 70. • The figure shows a scatter plot of X and Y values with a straight line drawn through the center of the data points. • The straight line is valuable because it makes the relationship easier to see and it can be used for prediction.
  • 71. • First determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables. • A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e., the scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model.
  • 72. MULTIPLE REGRESSION • Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. • The goal of multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variables.
  • 73. • The formula for a multiple linear regression is: • Independent variables x1, x2, and so on. • The number of independent variables can grow till n. • 𝑦 = 𝑏1x1 + 𝑏2𝑥2 + ⋯ 𝑏𝑛𝑥𝑛 + 𝑎 • 𝑦 = 𝑏1𝑥1 + 𝑏2𝑥2 + 𝑎
  • 74. • Example: A researcher decides to study students’ performance from a school over a period of time. He observed that as the lectures proceed to operate online, the performance of students started to decline as well. • The parameters for the dependent variable “decrease in performance” are various independent variables like of attention, more internet addiction, neglecting studies” and much more. • The multiple regression equation would be: • Y = b1 * attention + b2 * internet addiction + b3 * technology support + … BnXn + a
  • 75. • Multiple regression helps us to better study the various predictor variables at hand. • It increases reliability by avoiding dependency on just one variable and have more than one independent variable to support the event. • Multiple regression analysis permits you to study more formulated hypotheses that are possible.
  • 76. REFERENCE • Gravetter, F. J., & Forzano, L. B. (2011). Research Methods for the Behavioral Sciences, 4th Edition. In Descriptive Statistics (4th ed., pp. 434–451). Wadsworth Publishing.

Notas do Editor

  1. That is, a frequency distribution tells how frequencies are distributed over values. A frequency distribution displays two sets of information.
  2. Depending on the method used to display the scale of measurement and the frequencies, a frequency distribution can be a table or a graph.
  3. The first column lists the entire set of possible quiz scores (categories of measurement) in order from 5 to 0; it is headed X to indicate that these are the potential scores. The second column shows the frequency of occurrence for each score.
  4. Note that the frequencies add up to our sample size of 183 students. This is always the case unless a variable contains missing values: respondents can sometimes skip a question or answer “no answer” or something similar.
  5. For example,“67.5% of males and 63.2% of females graduated”is much easier to interpret than“79 out of 117 males and 120 out of 190 females graduated”.
  6. The same information that is presented in a frequency distribution table can be presented in a graph.
  7. Each block represents one individual so there is no need for a vertical axis to show the frequency for each score, and the graph shows how the individuals are distributed (piled up) along the scale of measurement.
  8. A polygon is a graph constructed by using lines to join the midpoints of each interval, or bin. Each of the graphs gives an organized picture of the entire set of scores so you can tell at a glance where the scores are located on the scale of measurement.
  9. A bar graph is like a histogram except that a space is left between adjacent bars. In this example, the class contains 10 psychology majors, 6 biology majors, and so on.
  10. As a result, frequency distributions are rarely shown in published research reports.
  11. Although frequency distribution tables and graphs have the advantage of presenting a complete picture of a set of data, there are simpler methods for describing the scores in a sample. Perhaps the most commonly used descriptive statistic involves computing the average score for a set of data.
  12. The mean (or average) is the most popular and well known measure of central tendency and the scores are numerical values obtained from an interval or a ratio scale of measurement. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used.
  13. You may have noticed that the above formula refers to the sample mean. So, why have we called it a sample mean?
  14. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. As we will find out later, taking the median would be a better measure of central tendency in this situation.
  15. It is the middle mark because there are 5 scores before it and 5 scores after it. but what happens when you have an even number of scores? What if you had only 10 scores?
  16. Well, you simply have to
  17. And When the scores consist of classifications that are not numerical value, the mode is the only available measure of central tendency.
  18. The most common form of transport, in this particular data set, is the bus.
  19. In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the mode to describe the central tendency of this data set would be misleading.
  20. Arithmetic is the branch of mathematics dealing with the properties and manipulation of numbers
  21. Skewed Distributions and the Mean and Median
  22. They have the same center, but what about their spreads?
  23. Whenever the mean is used as the measure of central tendency, the standard deviation is used as the measure of variability. One way to describe spread or variability is to compute the standard deviation.
  24. n (numbers of scores) Note that the value of n-1 is also called degrees of freedom or simply df. The topic of degrees of freedom occurs again later in the context of hypothesis tests
  25. Variance and standard deviation are directly related by a squaring or square root operation. We computed a variance of 4 for a set of n=5 scores.
  26. the standard deviation is considered the best way to describe variability. However, Variance because it measures squared distance which is not a simple concept is not usually used to describe variability.
  27. In a frequency distribution graph, the mean can be represented by a vertical line drawn through the center of the set of scores. In the same way, the standard deviation can be represented by two arrows that point out from the mean toward the opposite extremes of the frequency distribution. Notice that the standard deviation is shown as a distance from the mean and is intended to represent the standard distance. Some of the scores are closer to the mean, and some are farther away from the mean, but the arrows represent the standard or average distance.
  28. For example, if a research report describes a set of scores by stating that M 45 and SD 6, you should be able to visualize (or sketch) a frequency distribution graph showing the set of scores.
  29. Occasionally, the measurements or observations made by a researcher are not numerical values. In each case, the data do not consist of numerical values: there are no numbers with which to compute a mean or a standard deviation. In this case, the researcher must find some other method of describing the data.
  30. Remember, the mode simply identifies the most commonly occurring category and, therefore, describes the most typical member of a sample. However, the concept of distance between scores is meaningless with non-numerical values and it is impossible to compute a meaningful measure of variability.
  31. The structure of this type of experiment can be represented as a matrix, with one variable determining the rows and the second variable defining the columns.
  32. The matrix and the graph present hypothetical data for the temperature and humidity experiment. The figure includes a matrix showing the mean level of performance for each treatment condition and demonstrates how the means would be displayed in a graph.
  33. we list temperature values on the horizontal axis and list values for the mean level of performance on the vertical axis. Then, a separate line is used to present the means for each level of the second independent variable. In this case, there is a separate line for each of the two levels of humidity.
  34. If we want to provide a measure of the strength of the linear relationship between two quantitative variables, a good way is to report the correlation coefficient between them.
  35. this study uses a correlational strategy, measuring two variables for each participant, and computing a correlation to evaluate the relationship between variables.
  36. the scores are identified as X and Y and can be presented in a table or in a graph called a scatter plot. In the scatter plot, each individual is represented by a point in the graph; the horizontal position of the point corresponds to the value of X (self-esteem) and the vertical position is the value of Y (performance).
  37. Sum of scores
  38. The statistical process of finding the linear equation that produces the most accurate predicted values for Y using one predictor variable (X) is called regression. The process of finding the linear equation is called regression, and the resulting equation is called the regression equation. The value of b is called the slope constant because it describes the slope of the line (b and a are fixed constants)
  39. That is, for each value of X, the line provides a predicted value of Y. Before attempting to fit a linear model to observed data (next slide)
  40. Before attempting to fit a linear model to observed data
  41. When more than one predictor variable is used, the process is called multiple regression.
  42. Here, to calculate the value of the dependent variable y, the resulting equation is called the multiple regression equation.