Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Overview of Advance Marketing Research
1. Page | 1
Chapter - 15
Chapter Name: Frequency distribution, Cross-tabulation & Hypothesis testing
Frequency distribution:
A frequency distribution is a mathematical distribution whose objective is to obtain a count
of the number of responses associated with different values of one variable and to express
these counts in percentage terms. A frequency distribution for a variable produces a table of
frequency counts, percentages and cumulative percentages for all the values associated with
that variable. In a frequency distribution one variable is considered at a time.
Statistics associated with frequency distribution:
The most commonly used statistics associated with frequencies are-
Measures of Location:
It is a statistic that describes a location within a data set. Measures of central tendency
describe the center of the distribution.
1. Mean: The average, that value obtained by summing all the elements in a set and
dividing by the number of elements. The mean, , is given by
Where, = Observed values of the variable
n = number of observations (sample size)
2. Mode: A measure of central tendency given as the value that occurs the most in a
sample distribution. mode is symbolized by
x
n
x
x
n
1i
i
oM
2. Page | 2
3. Median: The median of a sample is the middle value when the data are arranged in
ascending or descending order. If the number of data points is even, the median is
usually estimated as the midpoint between the two middle values – by adding the
two middle values and dividing their sum by 2. The median is the 50th percentile.
Measures of Variability:
The measure of variability, which are calculated on interval or ratio data, include the
range, interquartile range, variance or standard deviation and coefficient of variance.
1. Range: The range measures the spread of the data. It is simply the difference
between the largest and smallest values in the sample. Such as, the range is directly
affected by outliers.
Range = X largest – X smallest
2. Interquartile Range: The interquartile range is the difference between the 75th
and
25th
percentile. For a set of data points arranged in order of magnitude, the pth
percentile is the value that has p% of the data points below it and (100 - p)% above
it.
Measures of Shape:
In addition to measure variability, measurements of shape are also useful in understanding
the nature of distribution. The shape of a distribution is assessed by examining skewness and
kurtosis.
1. Skewness: A characteristics of a distribution that assesses its symmetry about the
mean.
2. Kurtosis: Kurtosis is a measure of the relative peakedness or flatness of the curve
defined by the frequency distribution. The kurtosis of a normal distribution is zero. If
the kurtosis is positive, then the distribution is more peaked than a normal
distribution. A negative value means that the distribution is flatter than a normal
distribution.
3. Page | 3
A general procedure for hypothesis testing:
1. Formulate the hypothesis: A null hypothesis is a statement of the status quo, one
of no difference or no effect. If the null hypothesis is not rejected, no changes will be
made. An alternative hypothesis is one in which some difference or effect is
expected. Accepting the alternative hypothesis will lead to changes in opinions or
actions. The null hypothesis refers to a specified value of the population parameter
(e.g., µ, σ, π not a sample), statistic (e.g., X). A null hypothesis may be rejected, but
it can never be accepted based on a single test. In classical hypothesis testing, there is
no way to determine whether the null hypothesis is true. In marketing research, the
null hypothesis is formulated in such a way that its rejection leads to the acceptance
of the desired conclusion. The alternative hypothesis represents the conclusion for
which evidence is sought.
H 0: π ≤ 0.40
H1: π > 0.40.
The test of the null hypothesis is a one-tailed test, because the alternative hypothesis
is expressed directionally. If that is not the case, then a two- tailed test would be
required, and the hypotheses would be expressed as:
H 0: π = 0.40
H1: π ≠ 0. 0 4
2. Select an appropriate test: The test statistic measures how close the sample has
come to the null hypothesis. The test statistic often follows a well-known
distribution, such as the normal, t, or chi-square distribution. In our example, the z
statistic, which follows the standard normal distribution, would be appropriate.
3. Choose a level of significance, α: Type I error occurs when the sample results lead
to the rejection of the null hypothesis when it is in fact true. The probability of type I
error is also called the level of significance. Type II error occurs when, based on the
sample results, the null hypothesis is not rejected when it is in fact false. The
probability of type II error is denoted by β. Unlike α which is specified by the
researcher, the, magnitude of βdepends on the actual value of the population
4. Page | 4
parameter (proportion). The power of a test is the probability (1 - β of) rejecting the
null hypothesis when it is false and should be rejected. Although β is unknown, it is
related to α. An extremely low value of α (e.g., = 0.001) will result in intolerably
high β errors. So it is necessary to balance the two types of errors.
4. Collect data and calculate test statistic: The required data are collected and the
value of the test statistic computed.
5. Determine the probability (critical value): Using standard normal tables (Table 2
of the Statistical Appendix), the probability of obtaining a z value of 1.88 can be
calculated (see Figure 15.5). The shaded area between - ∞ and 1.88 is 0.9699.
Therefore, the area to the right of z = 1.88 is 1.0000 - 0.9699 = 0.0301.
Alternatively, the critical value of z, which will give an area to the right side of the
critical value of 0.05, is between 1.64 and 1.65 and equals 1.645. Note, in
determining the critical value of the test statistic, the area to the right of the critical
value is either α or α/2 It is α a one-tail test and for α/2 a two-tail test.
6. Compare the probability (critical value) and making the decision: The
probability associated with the calculated or observed value of the test statistic is
0.0301. This is the probability of getting a p value of 0.567 when ∏= 0.40. This is
less than the level of significance of 0.05. Hence, the null hypothesis is rejected.
Alternatively, if the calculated value of the test statistic is greater than the critical
value of the test statistic (TS CR), the null hypothesis is rejected.
7. Marketing research conclusion: The conclusion reached by hypothesis testing must
be expressed in terms of the marketing research problem.
Cross-Tabulation:
A statistical technique that describe two of more variables simultaneously and results in
tables that reflect the joint distribution of two or more variables with a limited number of
categories or distinct values. Cross-tabulation results in tables that reflect the joint
distribution of two or more variables with a limited number of categories or distinct values.
5. Page | 5
Two Variables Cross-Tabulation:
Since two variables have been cross classified, percentages could be computed either
column wise, based on column totals, or row wise, based on row totals. The general rule is
to compute the percentages in the direction of the independent variable, across the
dependent variable.
Three variable cross-Tabulation:
Often the introduction of a third variable clarifies the initial association observed between
two variables. The introduction of a third variable can result in four possibilities.
1. It can refine the association observed between the original variables.
2. It can indicate no association between two variables although an association initially
observed.
3. It can reveal some association between the two variables, although no association
initially observed.
4. It can indicate no change in the initial association.
Statistics associated with Cross-Tabulation:
1. Chi-Square: The chi-square statistic (χ2
) is used to test the statistical
significance of the observed association in a cross-tabulation. The expected
frequency for each cell can be calculated by using a simple formula:
Where
= total number in the row
= total number in the column
n = total sample size
6. Page | 6
2. Phi Coefficient: The phi coefficient (φ) is used as a measure of the strength of
association in the special case of a table with two rows and two columns (a 2 x 2
table). The phi coefficient is proportional to the square root of the chi-square
statistic χ2 φ= n. It takes the value of 0 when there is no association, which
would be indicated by a chi-square value of 0 as well. When the variables are
perfectly associated, phi assumes the value of 1 and all the observations fall just
on the main or minor diagonal.
3. Contingency Coefficient: Whereas the phi coefficient is specific to a 2 x 2 table,
the contingency coefficient (C) can be used to assess the strength of association
in a table of any size. The contingency coefficient varies between 0 and 1. The
maximum value of the contingency coefficient depends on the size of the table
(number of rows and number of columns). For this reason, it should be used only
to compare tables of the same size.
4. Cramer’s V: Cramers V is a modified version of the phi correlation coefficient,
φ, and is used in tables larger, than 2 x 2.
5. Lambda Coefficient: Asymmetric lambda measures the percentage
improvement in predicting the value of the dependent variable, given the value of
the independent variable. Lambda also varies between 0 and 1. A value of 0
means no improvement in prediction. A value of 1 indicates that the prediction
can be made without error. This happens when each independent variable
category is associated with a single category of the dependent variable.
Asymmetric lambda is computed for each of the variables (treating it as the
dependent variable). A symmetric lambda is also computed, which is a kind of
average of the two asymmetric values. The symmetric lambda does not make an
assumption about which variable is dependent. It measures the overall
improvement when prediction is done in both directions.
Cross-Tabulation in Practice:
7. Page | 7
When conducting cross-tabulation analysis in practice, it is useful to proceed along the
following steps.
1. Test the null hypothesis that there is no association between the variables using the
chi-square statistic. If you fail to reject the null hypothesis, then there is no
relationship.
2. If H0 is rejected, then determine the strength of the association using an appropriate
statistic.
3. If H0 is rejected, interpret the pattern of the relationship by computing the
percentages in the direction of the independent variable, across the dependent
variable.
4. If the variables are treated as ordinal rather than nominal, use tau b, tau c, or gamma
as the test statistic. If H0 is rejected, then determine the strength of the association
using the magnitude, and the direction of the relationship using the sign of the test
statistic.
5. Translate the results of hypothesis testing, strength of association, and patterned of
association into managerial implication and recommendations where meaningful.
Hypothesis Testing Related to Differences:
Parametric tests assume that the variables of interest are measured on at least an interval
scale. Nonparametric tests assume that the variables are measured on a nominal or ordinal
scale. These tests can be further classified based on whether one or two or more samples are
involved. The samples are independent if they are drawn randomly from different
populations. For the purpose of analysis, data pertaining to different groups of respondents,
e.g., males and females are generally treated as independent samples. The samples are paired
when the data for the two samples relate to the same group of respondents.
Parametric Tests: The t statistic assumes that the variable is normally distributed and the
mean is known (or assumed to be known) and the population variance is estimated from the
sample. Assume that the random variable X is normally distributed, with mean and unknown
population variance σ 2, which is estimated by the sample variance s 2. Then, t = (X - µ)/sx
8. Page | 8
is t distributed with n - 1 degrees of freedom. The t distribution is similar to the normal
distribution in appearance. Both distributions are bell-shaped and symmetric. As the number
of degrees of freedom increases, the t distribution approaches the normal distribution.
One sample:
One sample test of means compares the mean of a sample to a pre-specified value and tests
for a deviation from that value.
Two Independent samples:
Two samples that are not experimentally related. The measurement of one sample has no
effect on the values of the second sample.
F test: A statistical test of the equally of the variances of two population.
Paired samples:
In hypothesis testing, the observations are paired so that the sets of observations relate to the
same respondents. The difference in these cases is examined by a paired samples t test. To
compute t for paired samples, the paired difference variable, denoted by D, is formed and its
mean and variance calculated. Then the t statistic is computed. The degrees of freedom are n
- 1, where n is the number of pairs. The relevant formulas are:
H0: µ D = 0
H1: µ D ≠ 0
9. Page | 9
Chapter 16:
Chapter Name: Analysis of variance and covariance
ANOVA:
Analysis of variance (ANOVA) is used as a test of means for two or more populations. The
null hypothesis, typically, is that all means are equal. Analysis of variance must have a
dependent variable that is metric (measured using an interval or ratio scale). There must also
be one or more independent variables that are all categorical (nonmetric).
Factors:
Categorical independent variables. The independent variables must be all categorical
(nonmetric) to use ANOVA.
Treatment:
In ANOVA, a particular combination of factor levels, or categories, is called a treatment.
One-way analysis of variance:
One-way analysis of variance involves only one categorical variable, or a single factor. In
one-way analysis of variance, a treatment is the same as a factor level.
Two-way analysis of variance:
Two or more factors are involved, the analysis is termed n-way analysis of variance.
One-Way Analysis of Variance:
Marketing researchers are often interested in examining the differences in the mean values
of the dependent variable for several categories of a single independent variable or factor.
For example:
Do the various segments differ in terms of their volume of product consumption?
Do the brand evaluations of groups exposed to different commercials vary?
10. Page | 10
Do retailers, wholesalers, and agent differ in their attitudes towards the firms
distribution policies?
How do consumers’ intentions to buy the brand vary with different price level?
What is the effect of consumers' familiarity with the store (measured as high,
medium and low) on preference for the store?
Statistics Associated with One-Way Analysis of Variance:
eta2: The strength of the effects of X (independent variable or factor) on Y
(dependent variable) is measured by eta2 . The value of 2 varies between 0 and 1.
F statistic: The null hypothesis that the category means are equal in the population is
tested by an F statistic based on the ratio of mean square related to X and mean
square related to error.
Mean square: This is the sum of squares divided by the appropriate degrees of
freedom.
SSbetween: Also denoted as SSx, this is the variation in Y related to the variation in the
means of the categories of X. This represents variation between the categories of X,
or the portion of the sum of squares in Y related to X.
SSwithin: Also referred to as SSerror, this is the variation in Y due to the variation
within each of the categories of X. This variation is not accounted for by X.
SSy: This is the total variation in Y.
11. Page | 11
Conducting One-Way ANOVA:
Identify the dependent and independent variables
Decompose the total variation
Measure the effects
The test of significance
Interpret the results
Identify the Dependent and Independent Variables:
The dependent variable is denoted by Y and the independent variable I denoted by X. X is a
categorical variable having c categories. There are n observations on Y for each category of
X. As we can seen, in the sample size in each category of X is n, and the total sample sizeN=
n*c. Although the sample sizes in the categories of X are assumed to be equal for the sake of
simplicity, this is not a requirement.
Decompose the total variation:
The total variation in Y, denoted by SSy, can be decomposed into two components:
SSy = SSbetween + SSwithin
Where the subscripts between and within refer to the categories of X. SSbetween is the
variation in Y related to the variation in the means of the categories of X. For this reason,
SSbetween is also denoted as SSx. SSwithin is the variation in Y related to the variation within
12. Page | 12
each category of X. SSwithin is not accounted for by X. Therefore it is referred to as SSerror.
The total variation in Y may be decomposed as:
SSy = SSx + SSerror
Measure the effects: In analysis of variance, we estimate two measures of variation: within
groups (SSwithin) and between groups (SSbetween). Thus, by comparing the Y variance
estimates based on between-group and within-group variation, we can test the null
hypothesis. The strength of the effects of X on Y are measured as follows:
n2 = SSx/SSy
= (SSy - SSerror)/SSy
The value of 2 varies between 0 and 1.
The test of significance:
In one-way analysis of variance, the interest lies in testing the null hypothesis that the
category means are equal in the population.
H0: µ1 = µ2 = µ3 = ........... = µc
Under the null hypothesis, SSx and SSerror come from the same source of variation. In other
words, the estimate of the population variance of Y,
Sy
2
= SSx/(c - 1)
= Mean square due to X
= MSx
Interpret the Results: If the null hypothesis of equal category means is not rejected, then
the independent variable does not have a significant effect on the dependent variable. On
the other hand, if the null hypothesis is rejected, then the effect of the independent variable
is significant. A comparison of the category mean values will indicate the nature of the
effect of the independent variable.
13. Page | 13
N-Way Analysis of Variance:
In marketing research, one is often concerned with the effect of more than one factor
simultaneously. For example:
How do the consumers’ intentions to buy a brand vary with different levels of price
and different levels of distributions?
How do advertising levels (high, medium, and low) interact with price levels (high,
medium, and low) to influence a brand's sale?
Do educational levels (less than high school, high school graduate, some college, and
college graduate) and age (less than 35, 35-55, more than 55) affect consumption of
a brand?
What is the effect of consumers' familiarity with a department store (high, medium,
and low) and store image (positive, neutral, and negative) on preference for the
store?
Nonmetric analysis of variance:
Nonmetric analysis of variance examines the difference in the central tendencies of more
than two groups when the dependent variable is measured on an ordinal scale. One such
procedure is the k-sample median test.
14. Page | 14
Chapter 17:
Chapter name: Correlation and Regression
Product moment correlation: Product moment correlation is a statistic is used to
summarize the strength of association between two metric (interval or ratio) variables say X
and Y. It is also known as Pearson Correlation Co-efficient, Simple Correlation, Bivariate
Correlation or simply Correlation Co-efficient. It is proposed by Karl Pearson. In marketing
research, we are often interested in summarizing the strength of association between two
matric variables, as in the following situation:
Formula:
The value of r varies between -1 and +1. The value of r is equal-
1. 0 means there is no linear relationship between X and Y
2. 1 means there is a positive strong relationship between X and Y
3. -1 means there is a negative strong relationship between X and Y
Partial correlation:
Whereas the product moment or simple correlation is a measure of association describing
the linear association between two variables, a partial correlation coefficient measure the
association between two variables after controlling for or adjusting for the effects of one or
more additional variables. The statistic is used to answer the following question:
How strongly are sales related to adverting expenditures when the effect of price is
controlled?
n
1i
n
1i
2
i
2
i
i
n
1i
i
YYXX
YYXX
r
15. Page | 15
Is there an association between market share and size of the sales force after
adjusting for the effect of sales promotion?
Are consumers’ perceptions of quality related to there perceptions of prices when the
effect of brand image is controlled?
Nonmetric correlation:
A correlation measure for two nonmetric variables that relies on rankings to compute the
correlation. If the nonmetric variables are ordinal and numeric, Spearman’s rho rho, ps , and
kendall’s are two measures of nonmetric correlation that can be used to examine the
correlation between them. Both these measures use rankings rather than the absolute values
of the variables and the basic concepts underlying them are quite similar. Both vary from -
1.0 to 1.0.
Regression Analysis:
Regression analysis is a powerful and flexible procedure for analyzing associative
relationships between a metric dependent variable and one or more independent variables. It
is concerned with the nature and degree of association between variables and does not imply
or assume any causality. It is used in the following ways:
1. Determine whether the independent variables explain a significant variation
in the dependent variable: Whether a relationship exists
2. Determine how much of the variation in the dependent variable can be
explained by the independent variables: Strength of the relationship
3. Determine the structure or form of the relationship: The mathematical
equation relating the independent and dependent variables
4. Predict the values of the dependent variable
16. Page | 16
5. Construct for other independent variables where evaluating the contributions
of a specific variable or set of variables.
Bivariate Regression:
Bivariate regression is a procedure for deriving a mathematical relationship in the form of an
equation between a single metric dependent or criterion variable and a single metric
independent or predictor variable. The analysis is similar in many ways to determining the
simple correlation between variables. However, because an equation has to be derived, one
variable must be identified as the dependent and the other as the independent variables. The
examples given earlier in the context of simple correlation can be translated into the
regression context.
Can variation in scales be explained in terms of variation in advertising
expenditures? What is the structure and form of this relationship, and can it be
modeled mathematically by an equation describing a straight line?
Can the variation in market share be accounted for by the size of the scales force?
Are consumers’ perceptions of quality determined by their perceptions of price?
Statistics Associated with Bivariate Regression Analysis:
Bivariate regression analysis: The basic regression equation is
Where,
ei = is the error term associated with the ith observation.
inn
eXβXββY
11
0
linetheofSlopeβ
linetheofInterceptβ
variablepredictorortIndependenX
variablecriterionordependentY
1
0
17. Page | 17
Coefficient of determination: The strength of association is measured by the
coefficient of determination, r2
. It varies between 0 and 1 and signifies the proportion
of the total variation in Y that is accounted for by the variation in X.
Multiple Regression:
Multiple regression involves a single dependent variable and two or more independent
variables. The question raised in the contest of bivariate regression can also be answered via
multiple regression by considering additional independent variables.
Can variation in sales be explained in terms of variation in advertising expenditures,
prices, and level of distribution?
Can variation in market share are accounted for by the size of the sales force,
advertising, expenditures, and sales promotion budgets?
Are consumers’ perceptions of quality determined by their perceptions of price,
brand image and brand attributes?
Additional questions can also be answered by multiple regression
How much of the variation in sales can be explained by advertising expenditures,
prices, and level of distribution?
What is the contribution of advertising expenditures in explaining the variation in
sales when the levels of price and distribution are controlled?
What levels are sales may be expected, given the level of advertising expenditures,
prices, and level of distribution?
Statistics associated with Multiple Regression:
Adjusted R2
: R2
, coefficient of multiple determination, is adjusted for the number of
independent variables and the sample size to account for diminishing returns. After
the first few variables, the additional independent variables, the additional
independent variables do not make much contribution.
18. Page | 18
Coefficient of multiple determination: The strength of association in multiple
regression is measured by the square of the multiple correlation coefficient, R2
,
which is also called the coefficient of multiple determination.
Strength of Association:
The strength of the relationship stipulated by the regression equation can be determined by
using appropriate measure of association. The total variation is decomposed as in the
bivariate case:
The strength of association in multiple regression is measured by the square of the multiple
correlation coefficient, R2
, which is also called the coefficient of multiple determination.
The multiple correlation coefficient, R, can also be viewed as the single correlation
coefficient, r, between Y and . Several point about the characteristics of R2
are
The coefficient of multiple determination, R2
, cannot be less than the highest
bivariate, r2
, of any individual independent variable with the dependent variable.
R2
will be larger when the correlation between the independent variables are low.
n
1i
2
iires
n
1i
2
ireg
n
1i
2
iy
YˆYSS
YYˆSS
YYSS
y
resy
y
reg2
SS
SSSS
SS
SS
r
Yˆ
19. Page | 19
If the independent variables are statistically independent (uncorrelated), then
will be the sum of Bivariate r2
of each independent variable with the dependent
variable.
R2
cannot decrease as more independent variables are added to the regression
equation.
Significance Testing:
In testing the significance of the overall regression equation as well as specific partial
regression coefficients. The null hypothesis for the overall test is that the coefficient of
multiple determination in the population, is zero. . This is equivalent to the
following null hypothesis:
The overall test can be conducted by using an F statistic where-
Examination of residual:
A residual is the difference between the observed value of and the value predicted by the
regression equation, . Plotting the residuals against the independent variables provide
evidence of the appropriateness or inappropriateness of using a linear model. Again, the plot
should result in a random pattern. The residuals should fall randomly with relatively equal
distribution dispersion about 0. They should not display any tendency to be either positive or
negative.
2
R 0RH
2
0:
0ββββ:H k3210
1)-k-(nR-1
kR
1)-k-(nSS
kSS
F
2
2
res
reg
i
Y
i
Yˆ
20. Page | 20
Multicollinearity:
A state of very high intercorrelations among independent variables. Multicollinearity can
result in several problems, including---
The partial regression coefficients may not be estimated precisely. The standard
errors are likely to be high.
The magnitudes as well as the sings of the partial regression coefficients may change
from sample to sample.
It becomes difficult to assess the relative importance of the independent variables in
explaining the variation in the dependent variable.
Predictor variables may be incorrectly included or removed in stepwise regression.
21. Page | 21
Chapter -19:
Chapter Name: Factor Analysis
Factor Analysis:
Factor analysis is a general name denoting a class of procedures primarily used for data
reduction and summarization. In marketing research, there may be large number of
variables, most of which are correlated and which must be reduced to a manageable level.
Relationships among sets of many interrelated variables are examined and represented in
terms of underlying factors.
Factor analysis is used for the following circumstances:
1. To identify underlying dimensions, or factors, that explains the correlations among a
set of variable.
2. To identify a new, smaller set of uncorrelated variables to replace the original set of
correlated variables in subsequent multivariate analysis.
3. To identify a smaller set of salient variables from a larger set for use in subsequent
multivariate analysis.
Application of factor analysis:
1. It can be used in market segmentation for identifying the underlying variables on
which to group the customers.
2. In product research, factor analysis can be employed to determine the brand
attributes that influence customer choice.
3. In advertising studies, factor analysis can be used to understand the media
consumption habit for the target market.
4. In pricing studies, it can be used to identify the characteristics of price sensitive
consumers.
22. Page | 22
Conducting factor analysis:
Formulate the problem
Construct the correlation matrix
Determine the method of factor analysis
Determine the number of factors
Rotate the factors
Interpret the factors
Calculate the factor
scores
Select the surrogate
variables
Determine the model fit
23. Page | 23
Formulate the problem:
Problem formulation includes several tasks. First, the objectives of factor analysis should be
identified. The variables to be included in the factor analysis should be specified based on
past research, theory and judgment of the researcher. It is important that the variables be
appropriately measured on an interval or ratio scale.
Construct the correlation matrix:
The analytical process is based on a matrix of correlations between the variables. Valuable
insights can be gained from an examination of this matrix. For the factor analysis to be
appropriately, the variable must be correlated.
Determine the method of factor analysis:
Once it has been determine that factor analysis is suitable for analyzing the data, an
appropriate method must be selected. The approach used to drive the weight or factor score
coefficients differentiate the various methods of factor analysis. The two basic approaches
are
1. Principal component analysis: An approach to factor analysis that considers the
total variance in the data.
2. Common factor analysis: An approach to factor analysis that estimates the factors
based only on the common variable.
Determine the number of factors:
Seven procedures have been suggested for determining the number of factor. These include
a priori determination and approaches based on eigenvalues, scree plot, percentage of
variance accounted for, split half-reliability and significant tests.
1. A priori determination: Sometimes, because of priori knowledge, the researcher
knows how many factors to expect and thus can satisfy the number of factors to be
extracted beforehand. The extraction of factors ceases when the desired number of
factors have been extracted. Most computer programs allow the user to specify the
number of factors, allowing for an easy implementation of this approach.
24. Page | 24
2. Determination based on eigenvalues: In this approach, only factors with eigenvalues
greater than 1.0 are retained; the other factors are not included in the model. An
eigenvalue represent the amount of variance associated with the variable. Hence,
only factor with a variance greater than 1.0 are included. Factors with variance less
than 1.0 are not better than a single variance, because due to standardization, each
individual variable has a variance of 1.0. if the number of variables is less than 20,
this approach will result in a conservative number of factors.
3. Determination based on scree plot: A scree plot is a plot of eigenvalues against the
number of factors in order of extension. The shape of the plot is used to determine
the number of factors.
4. Determination based on percentage of variance: In the approach, the number of
factors extracted is determined so that the cumulative percentage of variance
extracted by the factors reaches a satisfactory level. What level of variance is
satisfactory depends upon the problem. However, it is remembered that the factors
extracted should account for at least 60 percentage of the variance.
5. Determination based on split half reliability: The sample is split in half and factor
analysis is performed on each half. Only factors with high correspondence of factor
loadings across the two subsamples are retained.
6. Determination based on significance tests: It is possible to determine the statistical
significance of the separate eigenvalues and retain only those factors that are
statistically significance.
Rotate factor:
An important output from factor analysis is the factor matrix, also called the factor pattern
matrix. The factor matrix consists the coefficients used to express the standardization
variables in terms of the factors.
Orthogonal rotation: Rotation of factors in which the axes are maintained at right
angel.
25. Page | 25
Varimax procedure: An orthogonal method of factor rotation that minimizes the
number of variables with high loadings on a factor, thereby enhancing the
interpretability of the factors.
Oblique rotation: Rotation of factors when the axes are not maintained at right
angles.
Interpret factors:
Interpretation is facilitated by identifying the variables that have large loadings on the same
factors. That factor can then be interpreted in terms of the variables that load high on it.
Factor scores: Composite scores estimated for each respondent on the derived factors. The
factor score on the ith factor may be estimated as follows:
Fi = Wi1X1 + Wi2X2 + Wi3X3 + …..+ WikXk
Select surrogate Variables:
Selection of substitute or surrogate variables involves singling out some of the original
variables for se the subsequent analysis.
Determine the model fit:
The final step of the factor analysis involves the determination of model fit. A basic
assumption underlying factor analysis is that the observed correlation between variables can
be attributed to common factor. Hence, the correlation between the variables can be deduced
or reproduced from the estimated correlations between the variables and the factors. The
differences between the observed correlations and the reproduced correlations can be
estimated to determine model fit.