This document provides an overview of descriptive statistics and index numbers used in data analysis. It defines descriptive statistics as methods used to describe and summarize patterns in data without making conclusions beyond what is directly observed. Various measures of central tendency like the mean, median, and mode are described as well as measures of dispersion such as range, standard deviation, and variance. Index numbers are constructed to study changes that cannot be measured directly, and weighted indexes like the Laspeyres and Paasche indexes are discussed.
2. Meaning of Descriptive Methods
• Descriptive analysis is the term given to the analysis of data that helps describe, show or summarise data in
a meaningful way such that, for example, patterns might emerge from the data.
• Descriptive statistics do not, however, permit the researcher to draw conclusions beyond the data he or she
has analysed or reach conclusions regarding any hypotheses he might have made. Descriptive research
attempts to show what already exists in a group or population.
• Descriptive statistics provide a quick method to make comparisons between different data sets and to spot
the smallest and largest values and trends or changes over a period of time.
• Descriptive statistics comprise the construction of graphs, charts and tables, and the computation of
various descriptive measures such as averages, measures of variation and percentiles.
3. Measures of Central Tendency
• A measure of central tendency is a typical value around which other figures congregate. An average stands
for the whole group of which it forms a part yet represents the whole. One of the most widely used sets of
summary figures is known as measures of location.
• The mean is the most common measure of central tendency and the one that can be mathematically
manipulated.
• Arithmetic mean: The arithmetic mean is the balance point in a distribution such that if you subtract each
value in the distribution from the mean and sum all of these deviation scores, the result will be zero. The
arithmetic mean is a mathematical representation of the typical value of a series of numbers, computed as
the sum of all the numbers in the series divided by the count of all numbers in the series.
4. Measures of Central Tendency (Contd.)
• Advantages of arithmetic mean: An important and perhaps the biggest benefit of using the arithmetic
mean, as a statistical measure, is its simplicity. Anyone capable of computing simple addition followed by
division can calculate the arithmetic mean of a data set. Of all the measures of central tendency, the
arithmetic mean or average is least affected by fluctuations when multiple data sets are extracted from a
larger population.
• Limitations of arithmetic mean: In data sets that are skewed or where outliers are present, computation of
the arithmetic mean often provides a misleading result.
• Weighted arithmetic mean: If all the items in a data are not of equal importance, we may compute a
weighted arithmetic mean, where the values are weighted according to their importance. A weighted
arithmetic mean is a more accurate measurement of scores that are of relative importance to each other.
5. • Harmonic mean: The harmonic mean of a set of observations is the reciprocal of the arithmetic mean of
the reciprocal of the observations. Harmonic mean is defined only for non-zero positive values and is used
for averaging while keeping one variable constant.
• Geometric mean: For inflation, price escalations, rates of return and population growth, the geometric
mean is the appropriate single point representing an average across time periods.
• Median: The median is the middle score. If you have an even number of events, you just take the average
of the two middle events. The median is best suited for describing the typical value. In research studies,
median is often used for variables such as income and prices.
Measures of Central Tendency (Contd.)
6. When data are arranged in ascending or descending order, it can be divided in various parts such as quartiles,
deciles and percentiles.
• Quartiles: Quartile simply means ‘one fourth of something’. Thus, we can safely say that a quartile value is
a certain fourth of a data set. When a given data are ordered in the ascending–descending manner, that is,
from the lowest value to the highest value and these data are divided into groups of four, we get what we
call quartiles.
• Deciles: Deciles are a simple statistical technique which measure inequality.
• Percentiles: Percentile is one of the values of a variable that divides the distribution of the variable into 100
groups having equal frequencies.
Measures of Central Tendency (Contd.)
7. • Mode: The mode of a distribution is the value at the point around which the items tend to be most heavily
concentrated. It may be regarded as the most typical of a series of values.
• Merits of mode: It is simple and popular, has less effect of marginal values, graphically presented and there
is no need of knowing all the items or frequencies.
• Demerits of mode: It is uncertain and vague, not capable of algebraic treatment, difficult and ignores
extreme marginal frequencies.
• Which measure to choose?: The mode should be used when calculating measure of centre for the
qualitative variable. When the variable is quantitative with symmetric distribution, then the mean is proper
measure of centre. In a case of quantitative variable with skewed distribution, the median is good choice
for the measure of centre.
Measures of Central Tendency (Contd.)
8. Measures of Dispersion
• Dispersion: In statistics, the term ‘dispersion’ refers to the variation of a random variable or its probability
distribution. It shows how far away the data points lie from the central value.
• Dispersion graph: Dispersion graph depicts the relationship between two variables. The graph gives a
simple illustration of how one variable can influence the other.
• Range: Range is the simplest of all our methods for measuring dispersion. It is the difference between the
highest value and the lowest value in the data set. While being simple to compute, the range is often an
unreliable measure of dispersion since it is based on only two values in the set.
• Interquartile range: The interquartile range is another range used to measure dispersion (spread) in the
data set. The difference between the upper and lower quartiles, which is called ‘the interquartile range’,
also indicates the dispersion of a data set.
9. Measures of Dispersion (Contd.)
• Mean absolute deviation from the mean: Mean absolute deviation from the mean is the average of
absolute differences (differences expressed without plus or minus sign) between each value in a set of
values and the average of all values of that set.
• Standard deviation: The SD is a numerical value used to indicate how widely individuals in a group vary. If
individual observations vary greatly from the group mean, the SD is big and vice versa.
• Variance: The ‘variance’ is a numerical value used to indicate how widely individuals in a group vary. If
individual observations vary greatly from the group mean, the variance is big and vice versa.
• Skewness: Skewness is a measure that describes the degree and direction of departure from symmetry. The
measure of asymmetry is called ‘measures of skewness’. These are classified as absolute measures and
relative measures. Absolute measures are known as measures of skewness, and relative measures are
known as coefficient of skewness.
10. • Moments: Moments can be defined as the arithmetic mean of various powers of deviations taken from the
mean of a distribution. These moments are known as central moments.
• Kurtosis: Kurtosis is a measure which determines whether the data are peaked or flat relative to a normal
distribution. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A
uniform distribution would be the extreme case.
• Lorenz curve: Lorenz curve is a statistical tool which graphically shows the dispersion. It simply describes
income distribution using a two-dimensional graph.
• Importance of measures of dispersion: Methods of dispersion occupy a significant place in any research
endeavour. They determine the reliability of an average, serve as a basis for the control of the variability,
facilitate comparisons between two or more series with regard to their variability and facilitate the use of
other statistical measures.
Measures of Dispersion (Contd.)
11. Index Numbers
• Researchers construct index numbers to study the change in the effects of such factors which cannot
be measured directly.
• In education research, index numbers are commonly used for (a) analysing and framing suitable
policies, (b) revealing trends and tendencies, and (c) deflating the quantitative variables.
• Construction of index numbers: Indexes are divided into two main categories, which are as follows:
Simple indexes: Simple aggregate index expresses the aggregate of price of all commodities in the
current year as a percentage of the aggregate price in the base year.
Weighted indexes: These are those indexes in which rational weights are assigned to various chains in
an explicit fashion. Two methods of computing a weighted price index are the Laspeyres method and
the Paasche method.
12. Index Numbers (Contd.)
• Laspeyres’ price index: In this method, Laspeyres uses base period weights for computing a weighted
index.
• Paasche’s price index: This method is an alternate to Laspeyres’ method. This method uses the weights of
the current period.
• Fisher’s ideal index: This index (Fisher 1922) attempts to offset the shortcomings of both the indexes
discussed before. It is the geometric mean of the Laspeyres’ and Paasche’s indexes. It balances the effects
of the two indexes.
• Consumer price index: The CPI is without a doubt the most popular inflation index all over the world. There
are several different version of the CPI but they are all built upon the idea of tracking prices for a basket of
goods and comparing them to a baseline year.