This document defines key statistical terms like population, sample, parameter, and statistic. It describes descriptive and inferential statistics. Descriptive statistics collect and describe data without inferences, while inferential statistics analyze a subset of data to make predictions about the entire dataset. The document also defines common measures of central tendency (mean, median, mode), variability (range, average deviation, variance, standard deviation), and how to choose the appropriate measure based on the characteristics of the data.
1. Terms
‡ Population the totality of all possible values (measurements or counts) of a particular
characteristic for specified group of objects
‡ Sample part of a population selected according to some rule or plan
‡ Parameter a descriptive property of a population
‡ Statistic any numerical value describing a characteristic of a sample
‡ Sampling the process of choosing a representative portion of a population (reading assignment:
SAMPLING METHODS)
‡ Statistical Method procedure used in the collection, presentation and analysis of data
STATISTICS
- presentation and interpretation of chance outcomes that occur in a planned or scientific
investigation
- deals with other NUMERICAL DATA representing COUNTS or MEASUREMENTS or
CATEGORICAL DATA that can be classified according to some criterion
- looks at TRENDS in the data, patterns
Uses of Statistics
1. Measures probability, predicting odds
2. For maintenance of quality use a statistic as basis or benchmark
3. For verifying claims
4. Predicting outcomes (interpolation)
5. Verifying correlations
2 Major Categories of Statistical Methods
1. DESCRIPTIVE STATISTICS collecting and describing a set of data; no inferences or conclusions
about a larger set of data
2. INFERENTIAL STATISTICS analyzing a subset of data leading to predictions or inferences about the
entire set of data using a sample to gauge the behaviour of the population
NOTE: A statistical inference is subject to uncertainty
2. Introduction to Not tions
£
If v e X is the v iable of inte est, and that n meas ements are taken, then the notation X1, X2, X3,
¥¤¡ ¢¡ ¢¡ ¢ ¢¦ ,
Xn will be used to re resent n observations.
§
Sigma , Indicates summation of
Su ¨¨ ation Notation
If variable X is the variable of interest, and that n measurements are taken, the sum of n observations can be written
as
THEOREMS:
1.
2. 3.
4. MODE
‡ Value that occurs most fre uently in the data set
‡ Locates the point where scores occur with the greatest density
‡ Less popular compared to mean and median measures
Properties
‡ It may not exist, or if it does, it may not be unique
‡ Not affected by extreme values
‡ Applicable for both qualitative and quantitative data
Measures of Variability and Dispersion
RANGE
‡ Measure of distance along the number line over where data exists
‡ Exclusive and inclusive range
± Exclusive range = largest score - smallest score
± Inclusive range = upper limit - lower limit
Properties
‡ Rough and general measure of dispersion
‡ Largest and smallest extreme values determine the range
‡ Does not describe distribution of values within the upper and lower extremes
‡ Does not depend on number of data
ABSOLUTE DEVIATION
Average of absolute deviations of scores from the mean (Mean Deviation) or the median (Median Absolute
Deviation)
Properties
‡ Measures variability of values in the data set
‡ Indicates how compact the group is on a certain measure
VARIANCE
‡ Average of the square of deviations measured from the mean
‡ Population variance ( 2) and sample variance (s2)
5. Properties
‡ Addition/subtraction of a constant c to each score will not change the variance of the scores
‡ Multiplying each score by a constant c changes the variance, resulting in a new variance multiplied
by c2
STANDARD DEVIATION
‡ Square root of the average of the square of deviations measured from the mean square root of
the variance
‡ Population standard deviation ( ) and sample standard deviation (s)
Why n-1?
‡ Degrees of freedom
± Measure of how much precision an estimate of variation has
± General rule is that the degrees of freedom decrease as moreparameters have to be
estimated
‡ Xbar estimates
‡ Using an estimated mean to find the standard deviation causes the loss of ONE degree of freedom
Properties
‡ Most used measure of variability
‡ Affected by every value of every observation
‡ Less affected by fluctuations and extreme values
‡ Addition/subtraction of a constant c to each score will not change the standard of the scores
‡ Multiplying each score by a constant c changes the standard deviation, resulting in a new standard
deviation multiplied by c
CHOOSING A MEASURE
‡ Range
± Data are too little or scattered to justify more precise and laborious measures
± Need to know only the total spread of scores
‡ Absolute Deviation
± Find and weigh deviations from the mean/median
± Extreme values unduly skews the standarddeviation
‡ Standard Deviation
± Need a measure with the best stability
± Effect of extreme values have been deemed acceptable
± Compare and correlate with other data sets