SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
WORLD UNIVERSITY OF BANGLADESH
SUBMITTED BY | MOIN SARKER | ID- 2534 | BATCH 42C|
MAJOR IN MARKETING
ASSIGNMENT MARKETING RESEARCH
SUBMITTED TO:
Mrinmoy Mitra
Lecturer | Department of Business
Describe the Q sort methodology?
Q Methodology is a research method used in psychology and in social sciences to study people's
"subjectivity"—that is, their viewpoint. Q was developed by psychologist William Stephenson. It
has been used both in clinical settings for assessing a patient's progress over time (intra-rater
comparison), as well as in research settings to examine how people think about a topic (inter-
rater comparisons).
The name "Q" comes from the form of factor analysis that is used to analyze the data. Normal
factor analysis, called "R method," involves finding correlations between variables (say, height
and age) across a sample of subjects. Q, on the other hand, looks for correlations between
subjects across a sample of variables. Q factor analysis reduces the many individual viewpoints
of the subjects down to a few "factors," which are claimed to represent shared ways of thinking.
It is sometimes said that Q factor analysis is R factor analysis with the data table turned
sideways. While helpful as a heuristic for understanding Q, this explanation may be misleading,
as most Q methodologists argue that for mathematical reasons no one data matrix would be
suitable for analysis with both Q and R.
The data for Q factor analysis come from a series of "Q sorts" performed by one or more
subjects. A Q sort is a ranking of variables—typically presented as statements printed on small
cards—according to some "condition of instruction." For example, in a Q study of people's views
of a celebrity, a subject might be given statements like "He is a deeply religious man" and "He is
a liar," and asked to sort them from "most like how I think about this celebrity" to "least like how
I think about this celebrity." The use of ranking, rather than asking subjects to rate their
agreement with statements individually, is meant to capture the idea that people think about ideas
in relation to other ideas, rather than in isolation.
The sample of statements for a Q sort is drawn from and claimed to be representative of a
"concourse"—the sum of all things people say or think about the issue being investigated.
Commonly Q methodologists use a structured sampling approach in order to try and represent
the full breadth of the concourse.
One salient difference between Q and other social science research methodologies, such as
surveys, is that it typically uses many fewer subjects. This can be a strength, as Q is sometimes
used with a single subject, and it makes research far less expensive. In such cases, a person will
rank the same set of statements under different conditions of instruction. For example, someone
might be given a set of statements about personality traits and then asked to rank them according
to how well they describe herself, her ideal self, her father, her mother, etc. Working with a
single individual is particularly relevant in the study of how an individual's rankings change over
time and this was the first use of Q-methodology. As Q-methodology works with a small non-
representative sample, conclusions are limited to those who participated in the study.
Short note
Measurement: Measurement is the process observing and recording the observations that are
collected as part of a research effort. There are two major issues that will be considered here.
First, you have to understand the fundamental ideas involved in measuring. Here we consider
two of major measurement concepts. In Levels of Measurement, I explain the meaning of the
four major levels of measurement: nominal, ordinal, interval and ratio. Then we move on to the
reliability of measurement, including consideration of true score theory and a variety of
reliability estimators.
Second, you have to understand the different types of measures that you might use in social
research. We consider four broad categories of measurements. Survey research includes the
design and implementation of interviews and questionnaires. Scaling involves consideration of
the major methods of developing and implementing a scale. Qualitative research provides an
overview of the broad range of non-numerical measurement approaches. And an unobtrusive
measure presents a variety of measurement methods that don't intrude on or interfere with the
context of the research.
Scaling: Scaling is the branch of measurement that involves the construction of an instrument
that associates qualitative constructs with quantitative metric units. Scaling evolved out of efforts
in psychology and education to measure "unmeasurable" constructs like authoritarianism and self
esteem. In many ways, scaling remains one of the most arcane and misunderstood aspects of
social research measurement. And, it attempts to do one of the most difficult of research tasks --
measure abstract concepts.
Nominal Scale: Nominal scales are used for labeling variables, without any quantitative value.
“Nominal” scales could simply be called “labels.” Here are some examples, below. Notice that
all of these scales are mutually exclusive (no overlap) and none of them have any numerical
significance. A good way to remember all of this is that “nominal” sounds a lot like “name” and
nominal scales are kind of like “names” or labels.
Illustration of primary scales of measurement
What is likert scale?
As in all scaling methods, the first step is to define what it is you are trying to measure. Because
this is a one-dimensional scaling method, it is assumed that the concept you want to measure is
one-dimensional in nature. You might operationalize the definition as an instruction to the people
who are going to create or generate the initial set of candidate items for your scale.
Next, you have to create the set of potential scale items. These should be items that can be rated
on a 1-to-5 or 1-to-7 Disagree-Agree response scale. Sometimes you can create the items by
yourself based on your intimate understanding of the subject matter. But, more often than not, it's
helpful to engage a number of people in the item creation step. For instance, you might use some
form of brainstorming to create the items. It's desirable to have as large a set of potential items as
possible at this stage, about 80-100 would be best.
Rating the Items. The next step is to have a group of judges rate the items. Usually you would
use a 1-to-5 rating scale where:
= strongly unfavorable to the concept
= somewhat unfavorable to the concept
= undecided
= somewhat favorable to the concept
= strongly favorable to the concept
Notice that, as in other scaling methods, the judges are not telling you what they believe -- they
are judging how favorable each item is with respect to the construct of interest.
Selecting the Items. The next step is to compute the intercorrelations between all pairs of items,
based on the ratings of the judges. In making judgments about which items to retain for the final
scale there are several analyses you can do:
Throw out any items that have a low correlation with the total (summed) score across all items
In most statistics packages it is relatively easy to compute this type of Item-Total correlation.
First, you create a new variable which is the sum of all of the individual items for each
respondent. Then, you include this variable in the correlation matrix computation (if you include
it as the last variable in the list, the resulting Item-Total correlations will all be the last line of the
correlation matrix and will be easy to spot). How low should the correlation be for you to throw
out the item? There is no fixed rule here -- you might eliminate all items with a correlation with
the total score less that .6, for example.
For each item, get the average rating for the top quarter of judges and the bottom quarter. Then,
do a t-test of the differences between the mean value for the item for the top and bottom quarter
judges.
Higher t-values mean that there is a greater difference between the highest and lowest judges. In
more practical terms, items with higher t-values are better discriminators, so you want to keep
these items. In the end, you will have to use your judgment about which items are most sensibly
retained. You want a relatively small number of items on your final scale (e.g., 10-15) and you
want them to have high Item-Total correlations and high discrimination (e.g., high t-values).
Administering the Scale. You're now ready to use your Likert scale. Each respondent is asked to
rate each item on some response scale. For instance, they could rate each item on a 1-to-5
response scale where:
= strongly disagree
= disagree
= undecided
= agree
= strongly agree
There are variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered
scales have a middle value is often labeled Neutral or Undecided. It is also possible to use a
forced-choice response scale with an even number of responses and no middle neutral or
undecided choice. In this situation, the respondent is forced to decide whether they lean more
towards the agree or disagree end of the scale for each item.
The final score for the respondent on the scale is the sum of their ratings for all of the items (this
is why this is sometimes called a "summated" scale). On some scales, you will have items that
are reversed in meaning from the overall direction of the scale. These are called reversal items.
You will need to reverse the response value for each of these items before summing for the total.
That is, if the respondent gave a 1, you make it a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2;
and, 5 = 1.
Example: The Employment Self Esteem Scale
Here's an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a
person has on the job. Notice that this instrument has no center or neutral point -- the respondent
has to declare whether he/she is in agreement or disagreement with the item.
INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following
statements by placing a check mark in the appropriate box.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
1. I feel good about my work on the job.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
2. On the whole, I get along well with others
at work.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
3. I am proud of my ability to cope with
difficulties at work.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
4. When I feel uncomfortable at work, I know
how to handle it.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
5. I can tell that other people at work are glad
to have me there.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
6. I know I'll be able to cope with work for as
long as I want.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
7. I am proud of my relationship with my
supervisor at work.
8. I am confident that I can handle my job
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
without constant assistance.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
9. I feel like I make a useful contribution at
work.
StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree
10. I can tell that my coworkers respect me.
What is reliability?
When we examine a construct in a study, we choose one of a number of possible ways to
measure that construct [see the section on Constructs in quantitative research, if you are unsure
what constructs are, or the difference between constructs and variables]. For example, we may
choose to use questionnaire items, interview questions, and so forth. These questionnaire items
or interview questions are part of the measurement procedure. This measurement procedure
should provide an accurate representation of the construct it is measuring if it is to be considered
valid. For example, if we want to measure the construct, intelligence, we need to have a
measurement procedure that accurately measures a person's intelligence. Since there are many
ways of thinking about intelligence (e.g., IQ, emotional intelligence, etc.), this can make it
difficult to come up with a measurement procedure that has strong validity [see the article:
Construct validity].
In quantitative research, the measurement procedure consists of variables; whether a single
variable or a number of variables that may make up a construct [see the section on Constructs in
quantitative research]. When we think about the reliability of these variables, we want to know
how stable or constant they are. This assumption, that the variable you are measuring is stable or
constant, is central to the concept of reliability. In principal, a measurement procedure that is
stable or constant should produce the same (or nearly the same) results if the same individuals
and conditions are used.
What is validity?
Validity is the extent to which a concept, conclusion or measurement is well-founded and
corresponds accurately to the real world. The word "valid" is derived from the Latin valid us,
meaning strong. The validity of a measurement tool (for example, a test in education) is
considered to be the degree to which the tool measures what it claims to measure; in this case,
the validity is an equivalent to accuracy.
In psychometrics, validity has a particular application known as test validity: "the degree to
which evidence and theory support the interpretations of test scores" ("as entailed by proposed
uses of tests").
It is generally accepted that the concept of scientific validity addresses the nature of reality and
as such is an epistemological and philosophical issue as well as a question of measurement. The
use of the term in logic is narrower, relating to the truth of inferences made from premises.
Validity is important because it can help determine what types of tests to use, and help to make
sure researchers are using methods that are not only ethical, and cost-effective, but also a method
that truly measures the idea or construct in question.
Relationship between reliability and validity?
Reliability and validity are important concepts within psychometrics.
Reliability is generally thought to be necessary for validity, but it does not guarantee validity.
Reliability and validity are, conceptually, quite distinct and there need not be any necessary
relationship between the two. Be wary of statements which imply that a valid test or measure has
to be reliable. Where the measurement emphasis is on relatively stable and enduring
characteristics of people (e.g. their creativity), a measure should be consistent over time
(reliable). It also ought to distinguish between inventors and the rest of us if it is a valid measure
of creativity. A measure of a characteristic which varies quite rapidly over time will not be
reliable over time - if it is then we might doubt its validity. For example, a valid measure of
suicide intention may not be particularly stable (reliable) over time though good at identifying
those at risk of suicide.
Validity is often expressed as a correlation between the measure and some criterion. This validity
coefficient will be limited or attenuated by the reliability of the test or measure. Thus, the
maximum correlation of the test of measure with any other variable has an upper limit
determined by the internal reliability.
Within classical test theory, predictive or concurrent validity (correlation between the predictor
and the predicted) cannot exceed the square root of the correlation between two versions of the
same measure — that is, reliability limits validity.
With this in mind, it can be helpful to conceptualize the following four basic scenarios for the
relation between reliability and validity:
Reliable (consistent) and valid (measures what it's meant to measure, i.e., a stable construct)
Reliable (consistent) and not valid (measures something consistently, but it doesn't measure what
its meant to measure)
Unreliable (not consistent) and not valid (inconsistent measure which doesn't measure what its
meant to measure)
Unreliable (not consistent) and valid (measures what it’s meant to measure, i.e., an unstable
construct)
It is important to distinguish between internal reliability and test-retest reliability. A measure of a
fluctuating phenomenon such as suicide intention may be valid but have low test-retest reliability
(depending on how much the phenomenon fluctuates and how far apart the test and retest is), but
the measure should exhibit good internal consistency on each occasion.
Systematic Errors
Systematic errors in experimental observations usually come from the measuring instruments.
They may occur because:
 there is something wrong with the instrument or its data handling system, or
 because the instrument is wrongly used by the experimenter.
Two types of systematic error can occur with instruments having a linear response:
1. Offset or zero setting error in which the instrument does not read zero when the
quantity to be measured is zero.
2. Multiplier or scale factor error in which the instrument consistently reads changes in
the quantity to be measured greater or less than the actual changes.
These errors are shown in Fig. 1. Systematic errors also occur with non-linear instruments when
the calibration of the instrument is not known correctly.
Fig. 1. Systematic errors in a linear instrument (full line).
Broken line shows response of an ideal instrument without error.
Examples of systematic errors caused by the wrong use of instruments are:
 errors in measurements of temperature due to poor thermal contact between the
thermometer and the substance whose temperature is to be found,
 Errors in measurements of solar radiation because trees or buildings shade the
radiometer.
The accuracy of a measurement is how close the measurement is to the true value of the quantity
being measured. The accuracy of measurements is often reduced by systematic errors, which are
difficult to detect even for experienced research workers.
Random Errors:
Random errors in experimental measurements are caused by unknown and unpredictable
changes in the experiment. These changes may occur in the measuring instruments or in the
environmental conditions.
Examples of causes of random errors are:
 electronic noise in the circuit of an electrical instrument,
 irregular changes in the heat loss rate from a solar collector due to changes in the wind.
Random errors often have a Gaussian normal distribution (see Fig. 2). In such cases statistical
methods may be used to analyze the data. The mean m of a number of measurements of the same
quantity is the best estimate of that quantity, and the standard deviation s of the measurements
shows the accuracy of the estimate. The standard error of the estimate m is s/sqrt(n), where n is
the number of measurements.
Fig. 2. The Gaussian normal distribution. m = mean of measurements. s = standard deviation of
measurements. 68% of the measurements lie in the interval m - s < x < m + s; 95% lie within m -
2s < x < m + 2s; and 99.7% lie within m - 3s < x < m + 3s.
The precision of a measurement is how close a number of measurements of the same quantity
agree with each other. The precision is limited by the random errors. It may usually be
determined by repeating the measurements.
What is double barreled question?
A double-barreled question (sometimes, double-direct question) is an informal fallacy. It is
committed when someone asks a question that touches upon more than one issue, yet allows only
for one answer. This may result in inaccuracies in the attitudes being measured for the question,
as the respondent can answer only one of the two questions, and cannot indicate which one is
being answered. Many double-barreled questions can be detected by the existence of the
grammatical conjunction "and" in them. This is not a foolproof test, as the word "and" can exist
in properly constructed questions. A question asking about three items is known as "trible (triple,
treble)-barreled." In legal proceedings, a double-barreled question is called a compound question
What are the Advantages and disadvantages of unstructured question?
The advantages of an unstructured approach are said to be:
The respondent can answer in any way that they wish; therefore they can lead the interview; they
are the better means of finding out ‘true’ opinion and identifying how strongly attitudes are held
e.g. Ann Oakley’s 178 interviews; for positivists the statistical patterns revealed can be used to
develop new theories; and questionnaires can be devised to test existing theories. This can be
known as triangulation. Feminist methodology is less male dominated/exploitative than
conventional research methods.
The disadvantages of an unstructured approach are said to be:
The coding of open questionnaires data distorts the distinct answers given by individuals; they
require more thought and time on the part of respondent. This reduces the number of questions
that can realistically be asked; the cost will be fairly expensive due to coding; it is more difficult
to even out opinion across a sample of questionnaires using open questions; respondents may
answer in unhelpful ways as they may not take it seriously like the census; respondents may have
trouble expressing themselves accurately and the Halo effect. The ‘Halo Effect’ refers to a
common bias, in the impression people form of others, by which attributes are often generalized.
Implicitly nice people, for example, are assumed to have all nice attributes. This can lead to
misleading judgments: for example, clever people may falsely be assumed to be knowledgeable
about everything. This can be a disadvantage when using open questionnaires due to making
assumptions.
What is the importance of questioners?
Questionnaires can be thought of as a kind of written interview. They can be carried out face to
face, by telephone or post.
Questions have been used to research type A personality (e.g. Friedman & Rosenman, 1974), and
also to assess life events which may cause stress (Holmes & Rahe, 1967).
Questionnaires provide a relatively cheap, quick and efficient way of obtaining large amounts of
information from a large sample of people. Data can be collected relatively quickly because the
researcher would not need to be present when the questionnaires were completed. This is useful
for large populations when interviews would be impractical.
However, a problem with questionnaire is that respondents may lie due to social desirability.
Most people want to present a positive image of them and so may lie or bend the truth to look
good, e.g. pupils would exaggerate revision duration.
Also the language of a questionnaire should be appropriate to the vocabulary of the group of
people being studied. For example, the researcher must change the language of questions to
match the social background of respondents' age / educational level / social class / ethnicity etc.
Questionnaires can be an effective means of measuring the behavior, attitudes, preferences,
opinions and intentions of relatively large numbers of subjects more cheaply and quickly than
other methods. An important distinction is between open-ended and closed questions.
Often a questionnaire uses both open and closed questions to collect data. This is beneficial as it
means both quantitative and qualitative data can be obtained.
Why is a sample preferred to a census?
A census is the procedure of systematically acquiring and recording information about all d
members of a given population and a sample is a group from d population a census is more
thorough and gives accurate information about a population while being more expensive and
consuming time comsuing rather than a sample.
sample could be more accurate than a (attempted) census if the fact of the exercise being a
census increases the bias from non-sampling error. This could come about, for example, if the
census generates an adverse political campaign advocating non-response (something less likely
to happen to a sample). Unless this happens, I can't see why a sample would be expected to have
less nonsampling error than a census; and by definition it will have more sampling error. So
apart from quite unusual circumstances I would say a census is going to be more accurate than a
sample.
Consider a common source of nonsampling error - systematic non-response eg by a particular
socio demographic group. If people from group X are likely to refuse the census, they are just as
likely to refuse the sample. Even with post stratification sampling to weight up the responses of
those people from group X who you do persuade to answer your questions, you still have a
problem because those might be the very segment of X that are pro-surveys. There is no real way
around this problem other than to be as careful as possible with your design of instrument and
delivery method.
In passing, this does draw attention to one possible issue that could make an attempted census
less accurate than a sample. Samples routinely have post stratification weighting to population,
which mitigates bias problems from issues such as that in my paragraph above. An attempted
census that doesn't get 100% return is just a large sample, and should in principle be subject to
the same processing; but because it is seen as a "census" (rather than an attempted census) this
may be neglected. So that census might be less accurate than the appropriately weighted sample.
But in this case the problem is the analytical processing technique (or omission of), not
something intrinsic to it being an attempted census.
Efficient is another matter - as Michelle says, a well conducted sample will be more efficient
than a census, and it may well have sufficient accuracy for practical purposes.
How the target population should be defined?
Target population refers to the ENTIRE group of individuals or objects to which researchers are
interested in generalizing the conclusions. The target population usually has varying
characteristics and it is also known as the theoretical population.
All research questions address issues that are of great relevance to important groups of
individuals known as a research population.
A research population is generally a large collection of individuals or objects that is the main
focus of a scientific query. It is for the benefit of the population that researches are done.
However, due to the large sizes of populations, researchers often cannot test every individual in
the population because it is too expensive and time-consuming. This is the reason why
researchers rely on sampling techniques. A research population is also known as a well-defined
collection of individuals or objects known to have similar characteristics. All individuals or
objects within a certain population usually have a common, binding characteristic or trait.
Usually, the description of the population and the common binding characteristic of its members
are the same. "Government officials" is a well-defined group of individuals which can be
considered as a population and all the members of this population are indeed officials of the
government.
How do probability sampling techniques differ from non probability sampling techniques?
The difference between nonprobability and probability sampling is that nonprobability sampling
does not involve random selection and probability sampling does. Does that mean that
nonprobability samples aren't representative of the population? Not necessarily. But it does mean
that nonprobability samples cannot depend upon the rationale of probability theory. At least with
a probabilistic sample, we know the odds or probability that we have represented the population
well. We are able to estimate confidence intervals for the statistic. With nonprobability samples,
we may or may not represent the population well, and it will often be hard for us to know how
well we've done so. In general, researchers prefer probabilistic or random sampling methods over
no probabilistic ones, and consider them to be more accurate and rigorous. However, in applied
social research there may be circumstances where it is not feasible, practical or theoretically
sensible to do random sampling. Here, we consider a wide range of no probabilistic alternatives.
We can divide nonprobability sampling methods into two broad types: accidental or purposive.
Most sampling methods are purposive in nature because we usually approach the sampling
problem with a specific plan in mind. The most important distinctions among these types of
sampling methods are the ones between the different types of purposive sampling approaches.
What is the relationship between quota sampling and judgmental sampling?
Judgmental sampling is a non-probability sampling technique where the researcher selects units
to be sampled based on their knowledge and professional judgment. This type of sampling
technique is also known as purposive sampling and authoritative sampling. On the other hand,
Quota sampling is a non-probability sampling technique wherein the assembled sample has the
same proportions of individuals as the entire population with respect to known characteristics,
traits or focused phenomenon. In addition to this, the researcher must make sure that the
composition of the final sample to be used in the study meets the research's quota criteria.
Judgmental sampling design is usually used when a limited number of individuals possess the
trait of interest. It is the only viable sampling technique in obtaining information from a very
specific group of people. It is also possible to use judgmental sampling if the researcher knows a
reliable professional or authority that he thinks is capable of assembling a representative sample.
In a study wherein the researcher likes to compare the academic performance of the different
high school class levels, its relationship with gender and socioeconomic status, the researcher
first identifies the subgroups. Usually, the subgroups are the characteristics or variables of the
study. The researcher divides the entire population into class levels, intersected with gender and
socioeconomic status. Then, he takes note of the proportions of these subgroups in the entire
population and then samples each subgroup accordingly.
What is the step in sampling design process?
The following are some of the important steps that one needs to keep in mind when developing a
sample design:-
Defining the universe or population of interest is the first step in any sample design. The
accuracy of the results in any study depends on how clearly the universe or population of interest
is defined. The universe can be finite or infinite, depending on the number of items it contains.
Defining the sampling unit within the population of interest is the second step in the sample
design process. The sampling unit can be anything that exists within the population of interest.
For example, sampling unit may be a geographical unit, or a construction unit or it may be an
individual unit.
Preparing the list of all the items within the population of interest is the next step in the sample
design process. It is from this list, which is also called as source list or sampling frame, that we
draw our sample. It is important to note that our sampling frame should be highly representative
of the population of interest.
Determination of sample size is the next step to follow. This is the most critical stage of the
sample design process because the sample size should not be excessively large nor it should be
too small. It is desired that the sample size should be optimum and it should be representative of
the population and should give reliable results. Population variance, population size, parameters
of interest, and budgetary constraints are some of the factors that impact the sample size.
Deciding about the technique of sampling is the next step in sample design. There are many
sampling techniques out of which the researchers has to choose the one which gives lowest
sampling error, given the sample size and budgetary constraints.
Describes classification of sampling design techniques?
A probability sample is a sample in which every unit in the population has a chance (greater than
zero) of being selected in the sample, and this probability can be accurately determined. The
combination of these traits makes it possible to produce unbiased estimates of population totals,
by weighting sampled units according to their probability of selection.
Nonprobability sampling is any sampling method where some elements of the population have
no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or
where the probability of selection can't be accurately determined. It involves the selection of
elements based on assumptions regarding the population of interest, which forms the criteria for
selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does
not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing
limits on how much information a sample can provide about the population. Information about
the relationship between sample and population is limited, making it difficult to extrapolate from
the sample to the population.
In a simple random sample (SRS) of a given size, all such subsets of the frame are given an equal
probability. Furthermore, any given pair of elements has the same chance of selection as any
other such pair (and similarly for triples, and so on). This minimises bias and simplifies analysis
of results. In particular, the variance between individual results within the sample is a good
indicator of variance in the overall population, which makes it relatively easy to estimate the
accuracy of results.
SRS can be vulnerable to sampling error because the randomness of the selection may result in a
sample that doesn't reflect the makeup of the population. For instance, a simple random sample
of ten people from a given country will on average produce five men and five women, but any
given trial is likely to overrepresent one sex and underrepresent the other. Systematic and
stratified techniques attempt to overcome this problem by "using information about the
population" to choose a more "representative" sample.
Systematic sampling (also known as interval sampling) relies on arranging the study population
according to some ordering scheme and then selecting elements at regular intervals through that
ordered list. Systematic sampling involves a random start and then proceeds with the selection of
every kth element from then onwards. In this case, k=(population size/sample size). It is
important that the starting point is not automatically the first in the list, but is instead randomly
chosen from within the first to the kth element in the list. A simple example would be to select
every 10th name from the telephone directory (an 'every 10th' sample, also referred to as
'sampling with a skip of 10').
As long as the starting point is randomized, systematic sampling is a type of probability
sampling. It is easy to implement and the stratification induced can make it efficient, if the
variable by which the list is ordered is correlated with the variable of interest. 'Every 10th'
sampling is especially useful for efficient sampling from databases.
Where the population embraces a number of distinct categories, the frame can be organized by
these categories into separate "strata." Each stratum is then sampled as an independent sub-
population, out of which individual elements can be randomly selected.[2] There are several
potential benefits to stratified sampling.
First, dividing the population into distinct, independent strata can enable researchers to draw
inferences about specific subgroups that may be lost in a more generalized random sample.
Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often
clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in
time – although this is rarely taken into account in the analysis.) For instance, if surveying
households within a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as
in stratified sampling. Then judgment is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of non-probability sampling. In quota
sampling the selection of the sample is non-random. For example interviewers might be tempted
to interview those who look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random element is its greatest weakness
and quota versus probability has been a matter of controversy for several years.
Snowball sampling involves finding a small group of initial respondents and using them to
recruit more respondents. It is particularly useful in cases where the population is hidden or
difficult to enumerate.
Pre coding and Post Coding?
Coding means assigning a code, usually a number, to each possible response to each question.
The code includes an indication of the column position (field) and data record it will occupy. For
example, the sex of respondents may be coded as 1 for females and 2 for males. A field
represents a single item of data, such as the sex of the respondent. A record consists of related
fields, such as sex, marital status, age, household size, occupation, and so on. All the
demographic and personality characteristics of a respondent may be contained in a single record.
The respondent code and the record number should appear on each record in the data. If possible,
standard codes should be used for missing data. For example, a code of 9 (or –9) could be used
for a single-digit variable (responses coded on a scale of 1 to 7), 99 for a double-digit variable
(responses coded on a scale of 1 to 11), and so forth. The missing value codes should be distinct
from the codes assigned to the legitimate responses. If the questionnaire contains only structured
questions or very few unstructured questions, it is pre coded. This means that codes are assigned
before field work is conducted. If the questionnaire contains unstructured questions, codes are
assigned after the questionnaires have been returned from the field (post coding). We provide
some guidelines on the coding of structured questions followed by coding of unstructured
question.
What options are available for the treatment of missing data?
There is a large literature of statistical methods for dealing with missing data. Here we briefly
review some key concepts and make some general recommendations for Cochrane review
authors. It is important to think why data may be missing. Statisticians often use the terms
‘missing at random’ and ‘not missing at random’ to represent different scenarios.
Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual
values of the missing data. For instance, if some quality-of-life questionnaires were lost in the
postal system, this would be unlikely to be related to the quality of life of the trial participants
who completed the forms. In some circumstances, statisticians distinguish between data ‘missing
at random’ and data ‘missing completely at random’; although in the context of a systematic
review the distinction is unlikely to be important. Data that are missing at random may not be
important. Analyses based on the available data will tend to be unbiased, although based on a
smaller sample size than the original data set.
Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual
missing data. For instance, in a depression trial, participants who had a relapse of depression
might be less likely to attend the final follow-up interview, and more likely to have missing
outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data
alone will typically be biased. Publication bias and selective reporting bias lead by definition to
data that are 'not missing at random', and attrition and exclusions of individuals within studies
often do as well.
The principal options for dealing with missing data are.
1. Analyzing only the available data (i.e. ignoring the missing data);
2. Imputing the missing data with replacement values, and treating these as if they were observed
(e.g. last observation carried forward, imputing an assumed outcome such as assuming all were
poor outcomes, imputing the mean, imputing based on predicted values from a regression
analysis);
3. Imputing the missing data and accounting for the fact that these were imputed with uncertainty
(e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard
error);
4. Using statistical models to allow for missing data, making assumptions about their
relationships with the available data.
Option 1 may be appropriate when data can be assumed to be missing at random. Options 2 to 4
are attempts to address data not missing at random. Option 2 is practical in most circumstances
and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in
the imputed values and results, typically, in confidence intervals that are too narrow. Options 3
and 4 would require involvement of a knowledgeable statistician.
Four general recommendations for dealing with missing data in Cochrane reviews are as
follows.
Whenever possible, contact the original investigators to request missing data.
Make explicit the assumptions of any methods used to cope with missing data: for example, that
the data are assumed missing at random, or that missing values were assumed to have a
particular value such as a poor outcome.
Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the
assumptions that are made. Address the potential impact of missing data on the findings of the
review in the Discussion section.
What is weighting? Reason behind for using weighting?
Weighting is the process of assigning numbers to certain groups of respondents in a study so that
their numbers reflect the actual proportions within the real world. For instance, let’s say a study
were undertaken about local businesses in an area known to contain 20% retail businesses, but
only 10% of the respondents in the study were retail businesses. This would set of results that did
not accurately reflect the real world. Therefore, the results might be weighted to reflect the
higher proportion of retail businesses in the area and as such it would more accurately reflect
reality.
Before considering various situations where data weighting is appropriate please consider a brief
review of the weighting concept. If we assume our data set is representative, then analysis
proceeds under the concept that the respondents in the sample represent the members of the
population in proper proportion (for example, the percentage of males, females, customers, non-
customers, etc. are nearly equivalent in the sample and the population). Having achieved
proportional representation in our sample, respondents are grouped according to various
characteristics and attitudes and tabulated accordingly; with each respondent counting as one
person.
If a data set contains specific groups of respondents that are either over-represented or under-
represented, then the sample is not indicative of the population and analyzing the data as
collected is not appropriate. Instead, the data should be redistributed (or weighted) so that we
achieve proportional representation in our sample. Specifically, each data point will carry a
weight and rather than each respondent counting equally as one sample member, will thereafter
represent either more or less than one sample member when results are tabulated.
This process will be illustrated more clearly by example in the following section. The important
concept to remember is that the goal of any study is to obtain a representative sample. If that is
not achieved naturally, redistribution of the data is required to yield one.
One of the most common circumstances where weighting is necessary is when quota sampling is
employed. Typically, studies employ quota sampling in order to obtain readable sample sizes for various
sub-segments of interest from the population. This ensures adequate margin of error for segment-to-
segment comparisons. However, this stratified sampling approach does not typically yield a viable total
sample. Should the researcher desire analysis of the total sample as well as segment comparisons,
redistribution of the data is required.
Let us consider a simple example of a study where stratified sampling is utilized.
In Example 1, the weighted data must be used when analyzing results that combine the two quota cells.
When tabulating the weighted data each Economy respondent will count as 1.8 persons and each
First/Business respondent will count as .2 persons. While this example is very simple, the processes
employed are replicated in more complicated weighting schemes. This technique can be applied to
multiple cells and across intersections of various quota designations (gender by region, age by race). In
almost all instances where quota sampling is utilized weighting is required. Hence, prior to tabulation the
researcher needs to consider the appropriate actions.
Unintentional Bias
Non-response
In quota sampling, the researcher intentionally introduces bias into the data by establishing a certain
number of interviews regardless of a particular population segment. In this next section we look at
unintentional data bias. A common form of this is known as non-response bias. This occurs when
particular types of respondents are not reached during the study. Historical data tell us that there are
certain respondents that are more difficult to reach overall (younger, affluent) and also based on the type
of methodology used (no Internet access, call blocking). There are certain sampling techniques that can be
utilized to minimize such bias, though it often still exists.
Weighting can be used to help mitigate the effects of non-responsebias. In such instances the researcher
compares the distribution of key classification variables in the sample to the actual population
distribution. If the distribution in the sample is not correct the data would be weighted using the
techniques described in the prior section. One drawback here is that when studying artificial populations
such as customers and prospects we often do not have reliable distributions for comparison.
Sample Balancing
Let us now consider data sets that are obtained from a particular geographic region – these can include
studies that are conducted on the national level, statewide or within a particular city or county. For studies
of this type there is a large amount of descriptive data available (via census figures) about the population
of interest. As such, when samples of these geographies are obtained it is imperative that the researcher
consider the sample distribution of respondent demographics.
When making comparisons to census data there are any number of characteristics that can be used. It is up
to the researcher to determine which are to be used in the weighting scheme. Obviously, any variable used
must be included in the survey data. Another consideration is missing values in the survey data. Since
these are difficult to account for, variables with a high proportion of missing values (such as income) are
often excluded. Once the weighting variables are identified the researcher needs to compute the actual
weights.
If only two characteristics are to be used in weighting the data the researcher might employ a technique
identical to the one portrayed in ourairline study example. Let’s say we want to weight on gender (2
groups) and region (9 census designations). This would yield a total of 18 cells. The process would be the
same as described for two groups (in Example 1), where the researcher would determine the desired
number of respondents in each cell and weight accordingly.
However, with this type of data there are often more than two characteristics of interest – for example
gender (2 groups), region (9 groups), age (5 groups) and race (4 groups) – all of which yields 360
individual cells to populate. This poses a number of difficulties. First, while distributions for individual
variables might be available, the distribution for each combination might not. Second each individual cell
might not have survey respondents populating the cell, making weighting impossible. To combat this, we
can employ a technique called sample balancing.
In sample balancing the weighting variables are redistributed individually, rather than computed cross
variable cells (example males; 18-24; Northeast). The process is iterative with weights being applied on
variable #1, then that new data set is weighted on variable #2; then that new data set is weighted on
variable #3. This process is repeated again and again in order to achieve distributions on all the weighting
variables that are close to the population.
Because non-response bias is difficult to measure, the researcher should apply some type of sample
balancing on the data regardless of how it looks unweighted. Whether to employ sample balancing or
simply compute weights for individual cells is dependent on the population information available, the
condition of the survey data and the number of respondents available in each cell.
Comparing Two Samples
Aside from making sure that data sets are representative, data weighting can also be a useful tool in
comparing two samples. This can occur in any number of instances; some of the most common being Test
vs. Control studies, Wave-to-Wave studies and studies that mix methodologies. In this section we will
examine how data weighting might be utilized.
A basic premise of Test vs. Control studies is that there are two (or more) comparable populations in
every respect but one – which is exposure to some type of stimuli. The goal of the study is to see if such
exposure changes attitudes and/or behaviors. Because of this, it is imperative that the researcher is
confident that differences seen between test and control groups be attributable to the stimuli and not
inherent differences in the group composition. As such, before making comparisons across test and
control cells, the researcher needs to compare the data sets on key demographic and behaviors that they
would expect to be similar.
The researcher can redistribute the data from one of the data sets to match the other. This would be
analogous to redistributing sample data to match the population. Here, we are less concerned with the
distribution of age in the population than we are in having two comparable data sets. In this instance the
researcher might take the Control Group distribution and weight the Test data to wind up with a
comparable 20% of respondents age 65+. This would then allow comparisons of key variables across the
cells.
This same process applies to wave studies. Again, the researcher is looking to compare groups, in this
instance across time. The underlying assumption is that in each wave comparable, representative samples
of some population are being drawn. Of course, it may occur that in one wave a bias is introduced. As
such, before analyzing wave-to-wave data the researcher should compare key demographic and
behavioral variables to ensure there is no change in the composition of the data sets. Should there be any
changes, the researcher should consider weighting the data prior to analysis in order to ensure that any
differences are real and not due to a sampling bias.
Emphasizing Key Characteristics
A slightly different take on weighting occurs when we want to emphasize a specific characteristic of the
respondent. This technique is often used when there are certain respondents we want to count more
heavily due to their importance. For example, a client may want to overemphasize the opinions and
behaviors of customers who spend the most money with them.
The approach in this scenario remains basically the same as has been described previously. The analysis
moves away from the idea that each respondent counts as one and utilizes redistribution of the data. In the
case where a client wanted to look at survey data based upon dollars spent, the researcher would assign
each respondent a weight based on this variable. So if a customer spends $100,000 with the client they
would get a weight of 100 while someone who spends $5,000 would get a weight of 5. When the data are
weighted and then analyzed, responses are distributed so that instead of looking at opinions based upon
people, the data would be showing responses based upon dollars spent.
What are the dummy variables? Why such variable created?
A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute
with two or more distinct categories/levels.
We will illustrate this with an example: Let’s say you want to find out whether the location of a
house in the East, Southeast or Northwest side of a development and whether the house was built
before or after 1990 affects its sale price. The image below shows a portion of the Sale Price
dataset:
Sale Price is the numerical response variable. The dummy variable Y1990 represents the binary
Independent variable ‘Before/After 1990’. Thus, it takes two values: ‘1’ if a house was built after
1990 and ‘0’ if it was built before 1990. Thus, a single dummy variable is needed to represent a
variable with two levels.
Notice, there are only two dummy variables left, East (E) and South East (SE). Together, they
represent the Location variable with three levels (E, SE, NW). They’re constructed so that
E = ‘1’ if the house falls on the East side and ‘0’ otherwise, and
SE = ‘1’ if the house falls on the Southeast side and ‘0’ otherwise
What happened to the third location, NW? Well, it turns out we don’t need a third dummy
variable to represent it. Setting both E and SE to ‘0’ indicates a house on the NW side. Notice
that this coding only works if the three levels are mutually exclusive (so not overlap) and
exhaustive (no other levels exist for this variable).
The regression of Sale Price on these dummy variables yields the following model:
Sale Price = 258 + 33.9*Y1990 - 10.7*E + 21*SE
The constant intercept value 258 indicates that houses in this neighborhood start at $258 K
irrespective of location and year built. The coefficient of Y1990 indicates that other things being
equal, houses in this Neighborhood built after 1990 command a $33.9 K premium over those
built before 1990.
Similarly, houses on the East side cost $10.7 K lower (it has a negative sign) than houses on the
NW side and houses on the SE side cost $21 K higher than houses on the NW side. Thus, NW
serves as the baseline or reference level for E and SE.
We can estimate the sale price for a house built before 1990 and located on the East side from
this equation By substituting Y1990 = 0, E = 1 and SE = 0, giving Sale Price = $247.3 K.
Things to keep in mind about dummy variables
Dummy variables assign the numbers ‘0’ and ‘1’ to indicate membership in any mutually
exclusive and exhaustive category.
1. The number of dummy variables necessary to represent a single attribute variable is equal to
the Number of levels (categories) in that variable minus one.
2. For a given attribute variable, none of the dummy variables constructed can be redundant.
That is, one dummy variable cannot be a constant multiple or a simple linear relation of another.
3. The interaction of two attribute variables (e.g. Gender and Marital Status) is represented by a
third Dummy variable which is simply the product of the two individual dummy variables.
What Is Data Preparation? Data preparation process?
Data Preparation is the process of collecting, cleaning, and consolidating data into one file or
data table for use in analysis. The process of preparing data generally entails correcting any
errors (typically from human and/or machine input), filling in nulls and incomplete data, and
merging data from several sources or data formats.
.Data preparation is most often used when:
Handling messy, inconsistent, or un-standardized data
Trying to combine data from multiple sources
Reporting on data that was entered manually
Dealing with data that was scraped from an unstructured source such as PDF documents
Data preparation can be used to harmonize, enrich, or to even standardize data in scenarios
where multiple values are used in a data set to represent the same value. An example of this is
seen with U.S. states – where multiple values are commonly used to represent the same state. A
state like California could be represented by ‘CA’, ‘Cal.’, ‘Cal’ or ‘California’ to name a few. A
data preparation tool could be used in this scenario to identify an incorrect number of unique
values (in the case of U.S. states, a unique count greater than 50 would raise a flag, as there are
only 50 states in the U.S.). These values would then need to be standardized to use only an
abbreviation or only full spelling in every row.
The process of data preparation typically involves:
1. Data analysis – The data is audited for errors and anomalies to be corrected. For large
datasets, data preparation applications prove helpful in producing metadata and uncovering
problems.
2. Creating an Intuitive workflow – A workflow consisting of a sequence of data prep operations
for addressing the data errors is then formulated.
3. Validation– The correctness of the workflow is next evaluated against a representative sample
of the dataset. This process may call for adjustments to the workflow as previously undetected
errors are found.
4. Transformation – Once convinced of the effectiveness of the workflow, transformation may
now be carried out, and the actual data prep process takes place.
5. Backflow of cleaned data – Finally, steps must also be taken for the clean data to replace the
original dirty data sources.
What is skewed distribution? What does it mean?
In probability theory and statistics, skewness is a measure of the asymmetry of the probability
distribution of a real-valued random variable about its mean. The skewness value can be positive
or negative, or even undefined.
A distribution is said to be skewed when the data points cluster more toward one side of the scale
than the other, creating a curve that is not symmetrical. In other words, the right and the left side
of the distribution are shaped differently from each other. There are two types of skewed
distributions.
A distribution is positively skewed if the scores fall toward the lower side of the scale and there
are very few higher scores. Positively skewed data is also referred to as skewed to the right
because that is the direction of the 'long tail end' of the chart. Let's create a chart using the yearly
income data that we collected from the MBA graduates. You can see that most of the graduates
reported annual income between $31,000 and $70,000. You can see that there are very few
graduates that make more than $70,000. The yearly income for MBA graduates is positively
skewed, and the 'long tail end' of the chart points to the right.
A distribution is negatively skewed if the scores fall toward the higher side of the scale and there
are very few low scores. Let's take a look at the chart of the number of applications each
graduate completed before they found their current job. We can see that most of the graduates
completed between 9 and 13 applications. Only 56 out of the 400 graduates completed less than
9 applications. The number of applications completed for MBA graduates is negatively skewed,
and the 'long tail end' points to the left. Negatively skewed data is also referred to as 'skewed to
the left' because that is the direction of the 'long tail end.'
What is the major different between cross tabulation and frequency distribution?
Frequency Distribution
A good first step in your analysis is to conveniently summarize the data by counting the
responses for each level of a given variable. These counts, or frequencies, are called the
frequency distribution and are commonly accompanied by the percentages and cumulative
percentages as well.
A frequency distribution can quickly reveal:
The number of no responses or missing values
Outliers and extreme values
The central tendency, variability and shape of the distribution.
Suppose a pet adoption and rescue agency wants to find out whether dogs or cats are more
popular in a certain location. To answer this question, we survey a random sample of 100 local
pet owners to find out if dogs are more popular than cats, or vice versa.
Cross Tabulation
A frequency distribution can tell you about a single variable, but it does not provide information
about how two or more variables relate to one another. To understand the association between
multiple variables, we can use cross tabulation.
Let’s say we want to see if a gender preference exists for dogs versus cats. Are men more likely
to want a dog than a cat compared to women, or vice versa?
To summarize data from both variables at the same time, we need to construct a cross-tabulation
table, also known as a contingency table. This table lets us evaluate the counts and percents, just
like a frequency distribution. But while a frequency distribution provides information for each
level of one variable, cross tabulation shows results for all level combinations of both variables.
What is a Null Hypothesis?
A null hypothesis is a type of hypothesis used in statistics that proposes that no statistical
significance exists in a set of given observations. The null hypothesis attempts to show that no
variation exists between variables, or that a single variable is no different than zero. It is
presumed to be true until statistical evidence nullifies it for an alternative hypothesis. The null
hypothesis assumes that any kind of difference or significance you see in a set of data is due to
chance. For example, Chuck sees that his investment strategy produces higher average returns
than simply buying and holding a stock. The null hypothesis claims that there is no difference
between the two average returns, and Chuck has to believe this until he proves otherwise.
Refuting the null hypothesis would require showing statistical significance, which can be found
using a variety of tests. If Chuck conducts one of these tests and proves that the difference
between his returns and the buy-and-hold returns is significant, he can then refute the null
hypothesis.
What is an alternative Hypothesis?
An alternative hypothesis states that there is statistical significance between two variables. In the
earlier example, the two variables are Mentos and Diet Coke. The alternative hypothesis is the
hypothesis that the researcher is trying to prove. In the Mentos and Diet Coke experiment,
Arnold was trying to prove that the Diet Coke would explode if he put Mentos in the bottle.
Therefore, he proved his alternative hypothesis was correct. If we continue with example, the
alternative hypothesis would be that there IS indeed a statistically significant relationship
between Mentos and Diet Coke. Arnold could write it as: If I put half a pack of Mentos into a 2-
Liter Diet Coke bottle, there will be a big reaction/explosion.
What is T test?
A T-test is a statistical examination of two population means. A two-sample t-test examines
whether two samples are different and is commonly used when the variances of two normal
distributions are unknown and when an experiment uses a small sample size. For example, a t-
test could be used to compare the average floor routine score of the U.S. women's Olympic
gymnastics team to the average floor routine score of China's women's team. The test statistic in
the t-test is known as the t-statistic. The t-test looks at the t-statistic, t-distribution and degrees of
freedom to determine a p value (probability) that can be used to determine whether the
population means differ. The t-test is one of a number of hypothesis tests. To compare three or
more variables, statisticians use an analysis of variance. If the sample size is large, they use a z-
test. Other hypothesis tests include the chi-square test and f-test.
What are the concept and condition for causality?
Statistics and economics usually employ pre-existing data or experimental data to infer causality
by regression methods. The body of statistical techniques involves substantial use of regression
analysis. Typically a linear relationship such as
is postulated, in which is the ith observation of the dependent variable (hypothesized to
be the caused variable), for j=1,...,k is the ith observation on the jth independent variable
(hypothesized to be a causative variable), and is the error term for the ith observation
(containing the combined effects of all other causative variables, which must be uncorrelated
with the included independent variables). If there is reason to believe that none of the s is
caused by y, then estimates of the coefficients are obtained. If the null hypothesis
that is rejected, then the alternative hypothesis that and equivalently
that causes y cannot be rejected. On the other hand, if the null hypothesis that
cannot be rejected, then equivalently the hypothesis of no causal effect of on y cannot be
rejected. Here the notion of causality is one of contributory causality as discussed above: If
the true value , then a change in will result in a change in y unless some other
causative variable(s), either included in the regression or implicit in the error term, change in
such a way as to exactly offset its effect; thus a change in is not sufficient to change y.
Likewise, a change in is not necessary to change y, because a change in y could be
caused by something implicit in the error term (or by some other causative explanatory
variable included in the model).
The above way of testing for causality requires belief that there is no reverse causation, in
which y would cause . This belief can be established in one of several ways. First, the
variable may be a non-economic variable: for example, if rainfall amount is
hypothesized to affect the futures price y of some agricultural commodity, it is impossible
that in fact the futures price affects rainfall amount (provided that cloud seeding is never
attempted). Second, the instrumental variables technique may be employed to remove any
reverse causation by introducing a role for other variables (instruments) that are known to be
unaffected by the dependent variable. Third, the principle that effects cannot precede causes
can be invoked, by including on the right side of the regression only variables that precede in
time the dependent variable; this principle is invoked, for example, in testing for Granger
causality and in its multivariate analog, vector auto regression, both of which control for
lagged values of the dependent variable while testing for causal effects of lagged
independent variables.
Regression analysis controls for other relevant variables by including them as regressors
(explanatory variables). This helps to avoid false inferences of causality due to the presence
of a third, underlying, variable that influences both the potentially causative variable and the
potentially caused variable: its effect on the potentially caused variable is captured by
directly including it in the regression, so that effect will not be picked up as an indirect effect
through the potentially causative variable of interest.

Mais conteúdo relacionado

Mais procurados

Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...
Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...
Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...Sundar B N
 
attitude mesurement and scaling
attitude mesurement and scalingattitude mesurement and scaling
attitude mesurement and scalingNancy Dawar
 
Research methodology measurement
Research methodology measurement Research methodology measurement
Research methodology measurement 49bhu
 
4. research process (part 2)
4. research process (part 2)4. research process (part 2)
4. research process (part 2)Muneer Hussain
 
Attitude scales ppt
Attitude scales pptAttitude scales ppt
Attitude scales pptpranveer123
 
Assessment Tool: Semantic Differential Scales
Assessment Tool: Semantic Differential ScalesAssessment Tool: Semantic Differential Scales
Assessment Tool: Semantic Differential ScalesJill Frances Salinas
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research Yagnesh sondarva
 
Conceptualization, operationalization and measurement
Conceptualization, operationalization and measurementConceptualization, operationalization and measurement
Conceptualization, operationalization and measurementKatherine Bautista
 
Inu peer group ppt
Inu peer group pptInu peer group ppt
Inu peer group pptinuarya
 
TYPE AND USES OF ATTITUDE SCALES
TYPE AND USES OF ATTITUDE SCALESTYPE AND USES OF ATTITUDE SCALES
TYPE AND USES OF ATTITUDE SCALESSajan Ks
 
Scales of measurement (1)
Scales of measurement (1)Scales of measurement (1)
Scales of measurement (1)Anju Gautam
 
Research methodology; Scaling Methods - What Is the Best Response Scale for S...
Research methodology; Scaling Methods - What Is the Best Response Scale for S...Research methodology; Scaling Methods - What Is the Best Response Scale for S...
Research methodology; Scaling Methods - What Is the Best Response Scale for S...Hamed Taherdoost
 
Measurement of attitude
Measurement of attitudeMeasurement of attitude
Measurement of attitudeMukut Deori
 

Mais procurados (20)

Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...
Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...
Attitude Measurement Scales - Likert‘s Scale, Semantic Differential Scale, Th...
 
Measurement and evaluation
Measurement and evaluationMeasurement and evaluation
Measurement and evaluation
 
attitude mesurement and scaling
attitude mesurement and scalingattitude mesurement and scaling
attitude mesurement and scaling
 
Scaling
ScalingScaling
Scaling
 
Research methodology measurement
Research methodology measurement Research methodology measurement
Research methodology measurement
 
tools of research
tools of researchtools of research
tools of research
 
Attitude scaling
Attitude scalingAttitude scaling
Attitude scaling
 
4. research process (part 2)
4. research process (part 2)4. research process (part 2)
4. research process (part 2)
 
Attitude scales ppt
Attitude scales pptAttitude scales ppt
Attitude scales ppt
 
Assessment Tool: Semantic Differential Scales
Assessment Tool: Semantic Differential ScalesAssessment Tool: Semantic Differential Scales
Assessment Tool: Semantic Differential Scales
 
Measurement in social science research
Measurement in social science research Measurement in social science research
Measurement in social science research
 
Conceptualization, operationalization and measurement
Conceptualization, operationalization and measurementConceptualization, operationalization and measurement
Conceptualization, operationalization and measurement
 
Inu peer group ppt
Inu peer group pptInu peer group ppt
Inu peer group ppt
 
2 attitude
2 attitude2 attitude
2 attitude
 
TYPE AND USES OF ATTITUDE SCALES
TYPE AND USES OF ATTITUDE SCALESTYPE AND USES OF ATTITUDE SCALES
TYPE AND USES OF ATTITUDE SCALES
 
Scales of measurement (1)
Scales of measurement (1)Scales of measurement (1)
Scales of measurement (1)
 
Scaling techniques
Scaling techniquesScaling techniques
Scaling techniques
 
Research methodology; Scaling Methods - What Is the Best Response Scale for S...
Research methodology; Scaling Methods - What Is the Best Response Scale for S...Research methodology; Scaling Methods - What Is the Best Response Scale for S...
Research methodology; Scaling Methods - What Is the Best Response Scale for S...
 
Ch14 attitude measurement
Ch14 attitude measurementCh14 attitude measurement
Ch14 attitude measurement
 
Measurement of attitude
Measurement of attitudeMeasurement of attitude
Measurement of attitude
 

Destaque

Research quiz -Dr. Khaled Khader
Research quiz -Dr. Khaled KhaderResearch quiz -Dr. Khaled Khader
Research quiz -Dr. Khaled Khaderkhaledkhader
 
How to answer a media exam question... kind of
How to answer a media exam question... kind ofHow to answer a media exam question... kind of
How to answer a media exam question... kind ofJack Wentworth-Weedon
 
Ccna 1 practice final exam answer v5
Ccna 1 practice final exam answer v5Ccna 1 practice final exam answer v5
Ccna 1 practice final exam answer v5friv4schoolgames
 
Research Method.ppt
Research Method.pptResearch Method.ppt
Research Method.pptShama
 
Acc 400 final exam answer
Acc 400 final exam answerAcc 400 final exam answer
Acc 400 final exam answerRenea Barrera
 
research-methodology-ppt
 research-methodology-ppt research-methodology-ppt
research-methodology-pptsheetal321
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsAhmed-Refat Refat
 
Definition and types of research
Definition and types of researchDefinition and types of research
Definition and types of researchfadifm
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destaque (15)

Research quiz -Dr. Khaled Khader
Research quiz -Dr. Khaled KhaderResearch quiz -Dr. Khaled Khader
Research quiz -Dr. Khaled Khader
 
How to answer a media exam question... kind of
How to answer a media exam question... kind ofHow to answer a media exam question... kind of
How to answer a media exam question... kind of
 
Ccna 1 practice final exam answer v5
Ccna 1 practice final exam answer v5Ccna 1 practice final exam answer v5
Ccna 1 practice final exam answer v5
 
krishnan's quiz
krishnan's quizkrishnan's quiz
krishnan's quiz
 
Research Method.ppt
Research Method.pptResearch Method.ppt
Research Method.ppt
 
Acc 400 final exam answer
Acc 400 final exam answerAcc 400 final exam answer
Acc 400 final exam answer
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
 
research-methodology-ppt
 research-methodology-ppt research-methodology-ppt
research-methodology-ppt
 
Research methodology notes
Research methodology notesResearch methodology notes
Research methodology notes
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and Methods
 
Types of Research
Types of ResearchTypes of Research
Types of Research
 
Definition and types of research
Definition and types of researchDefinition and types of research
Definition and types of research
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Semelhante a Some Research ques & ans ( Assignment)

Scaling 120121081027-phpapp01
Scaling 120121081027-phpapp01Scaling 120121081027-phpapp01
Scaling 120121081027-phpapp01Surabhi Prajapati
 
Business Research Methods Unit 3 notes
Business Research Methods Unit 3 notesBusiness Research Methods Unit 3 notes
Business Research Methods Unit 3 notesSUJEET TAMBE
 
Creating Items and Response Scales Essay.docx
Creating Items and Response Scales Essay.docxCreating Items and Response Scales Essay.docx
Creating Items and Response Scales Essay.docxstudywriters
 
Attitude measurement.pptx
Attitude measurement.pptxAttitude measurement.pptx
Attitude measurement.pptxTheMusicFever
 
SAMPLE_AND_OTHER.ppt
SAMPLE_AND_OTHER.pptSAMPLE_AND_OTHER.ppt
SAMPLE_AND_OTHER.pptnarman1402
 
Differences between qualitative
Differences between qualitativeDifferences between qualitative
Differences between qualitativeShakeel Ahmad
 
What is research &amp; process
What is research &amp; processWhat is research &amp; process
What is research &amp; processAbaid Manj
 
Strict Standards Only variables should be passed by reference.docx
Strict Standards Only variables should be passed by reference.docxStrict Standards Only variables should be passed by reference.docx
Strict Standards Only variables should be passed by reference.docxflorriezhamphrey3065
 
Understanding the Scaling in Research
Understanding the Scaling in ResearchUnderstanding the Scaling in Research
Understanding the Scaling in ResearchDrShalooSaini
 
Attitude Measurement Scales
 Attitude Measurement Scales Attitude Measurement Scales
Attitude Measurement ScalesQuratulaintahir1
 
What is research
What is researchWhat is research
What is researchanam795
 
Scales: Semantic Differential Scale Summated Rating Scale
Scales: Semantic Differential Scale Summated Rating ScaleScales: Semantic Differential Scale Summated Rating Scale
Scales: Semantic Differential Scale Summated Rating ScaleDrSindhuAlmas
 
validity of scale.pdf
validity of scale.pdfvalidity of scale.pdf
validity of scale.pdfSrijoniChaki
 

Semelhante a Some Research ques & ans ( Assignment) (20)

Indexes scales and typologies
Indexes scales and typologiesIndexes scales and typologies
Indexes scales and typologies
 
arStarting1.ppt
arStarting1.pptarStarting1.ppt
arStarting1.ppt
 
BRM Unit II
BRM Unit IIBRM Unit II
BRM Unit II
 
Scaling 120121081027-phpapp01
Scaling 120121081027-phpapp01Scaling 120121081027-phpapp01
Scaling 120121081027-phpapp01
 
Business Research Methods Unit 3 notes
Business Research Methods Unit 3 notesBusiness Research Methods Unit 3 notes
Business Research Methods Unit 3 notes
 
Campus Session 2
Campus Session 2Campus Session 2
Campus Session 2
 
What is research
What is researchWhat is research
What is research
 
Creating Items and Response Scales Essay.docx
Creating Items and Response Scales Essay.docxCreating Items and Response Scales Essay.docx
Creating Items and Response Scales Essay.docx
 
SCALES.pptx
SCALES.pptxSCALES.pptx
SCALES.pptx
 
Attitude measurement.pptx
Attitude measurement.pptxAttitude measurement.pptx
Attitude measurement.pptx
 
SAMPLE_AND_OTHER.ppt
SAMPLE_AND_OTHER.pptSAMPLE_AND_OTHER.ppt
SAMPLE_AND_OTHER.ppt
 
Differences between qualitative
Differences between qualitativeDifferences between qualitative
Differences between qualitative
 
What is research &amp; process
What is research &amp; processWhat is research &amp; process
What is research &amp; process
 
Strict Standards Only variables should be passed by reference.docx
Strict Standards Only variables should be passed by reference.docxStrict Standards Only variables should be passed by reference.docx
Strict Standards Only variables should be passed by reference.docx
 
Understanding the Scaling in Research
Understanding the Scaling in ResearchUnderstanding the Scaling in Research
Understanding the Scaling in Research
 
Attitude Measurement Scales
 Attitude Measurement Scales Attitude Measurement Scales
Attitude Measurement Scales
 
What is research
What is researchWhat is research
What is research
 
Scales: Semantic Differential Scale Summated Rating Scale
Scales: Semantic Differential Scale Summated Rating ScaleScales: Semantic Differential Scale Summated Rating Scale
Scales: Semantic Differential Scale Summated Rating Scale
 
unit 2.4.ppt
unit 2.4.pptunit 2.4.ppt
unit 2.4.ppt
 
validity of scale.pdf
validity of scale.pdfvalidity of scale.pdf
validity of scale.pdf
 

Mais de Moin Sarker

Bus.com 507-presentation- Grapevine & Email Communications
Bus.com 507-presentation- Grapevine & Email Communications Bus.com 507-presentation- Grapevine & Email Communications
Bus.com 507-presentation- Grapevine & Email Communications Moin Sarker
 
Function, Performance & significant role of Islami Bank Bangladesh Limited
Function, Performance &  significant role of Islami Bank Bangladesh LimitedFunction, Performance &  significant role of Islami Bank Bangladesh Limited
Function, Performance & significant role of Islami Bank Bangladesh LimitedMoin Sarker
 
effects of Celebrity endorsement on cunsumer buying decision Presentation
effects of Celebrity endorsement on cunsumer buying decision Presentationeffects of Celebrity endorsement on cunsumer buying decision Presentation
effects of Celebrity endorsement on cunsumer buying decision PresentationMoin Sarker
 
A study on Celebrity endorsement of Consumer buying decision Report
A study on Celebrity endorsement of Consumer buying decision ReportA study on Celebrity endorsement of Consumer buying decision Report
A study on Celebrity endorsement of Consumer buying decision ReportMoin Sarker
 
Money banking assignment 1.2
Money  banking assignment 1.2Money  banking assignment 1.2
Money banking assignment 1.2Moin Sarker
 
Money banking assignment
Money  banking assignment Money  banking assignment
Money banking assignment Moin Sarker
 
Business Plan: Photography Business (slides)
Business Plan: Photography Business (slides)Business Plan: Photography Business (slides)
Business Plan: Photography Business (slides)Moin Sarker
 
Business Plan: Photography Business
Business Plan: Photography Business Business Plan: Photography Business
Business Plan: Photography Business Moin Sarker
 
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page )
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page ) Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page )
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page ) Moin Sarker
 
Ratio Analysis of Apex Adelchi Footwear Ltd
Ratio Analysis of Apex Adelchi Footwear LtdRatio Analysis of Apex Adelchi Footwear Ltd
Ratio Analysis of Apex Adelchi Footwear LtdMoin Sarker
 
Apex Adelchi Footwear Limited Ratio Analysis
Apex Adelchi Footwear Limited Ratio Analysis Apex Adelchi Footwear Limited Ratio Analysis
Apex Adelchi Footwear Limited Ratio Analysis Moin Sarker
 
A presentation Tarek masud
A presentation Tarek masudA presentation Tarek masud
A presentation Tarek masudMoin Sarker
 
Tale of a frontiersman
Tale of a frontiersman Tale of a frontiersman
Tale of a frontiersman Moin Sarker
 
Solar business presentation( plan)
Solar business presentation( plan)Solar business presentation( plan)
Solar business presentation( plan)Moin Sarker
 
Social stratification
Social stratificationSocial stratification
Social stratificationMoin Sarker
 
cisco case solve
cisco case solvecisco case solve
cisco case solveMoin Sarker
 
A Study About Pran Products
A Study About Pran  Products A Study About Pran  Products
A Study About Pran Products Moin Sarker
 
Case Solve of Falcon computer
Case Solve of Falcon computerCase Solve of Falcon computer
Case Solve of Falcon computerMoin Sarker
 
Audit Report's To Share holder
Audit Report's To Share holder Audit Report's To Share holder
Audit Report's To Share holder Moin Sarker
 
Managerial Accounting & It;s Overview
Managerial Accounting & It;s OverviewManagerial Accounting & It;s Overview
Managerial Accounting & It;s OverviewMoin Sarker
 

Mais de Moin Sarker (20)

Bus.com 507-presentation- Grapevine & Email Communications
Bus.com 507-presentation- Grapevine & Email Communications Bus.com 507-presentation- Grapevine & Email Communications
Bus.com 507-presentation- Grapevine & Email Communications
 
Function, Performance & significant role of Islami Bank Bangladesh Limited
Function, Performance &  significant role of Islami Bank Bangladesh LimitedFunction, Performance &  significant role of Islami Bank Bangladesh Limited
Function, Performance & significant role of Islami Bank Bangladesh Limited
 
effects of Celebrity endorsement on cunsumer buying decision Presentation
effects of Celebrity endorsement on cunsumer buying decision Presentationeffects of Celebrity endorsement on cunsumer buying decision Presentation
effects of Celebrity endorsement on cunsumer buying decision Presentation
 
A study on Celebrity endorsement of Consumer buying decision Report
A study on Celebrity endorsement of Consumer buying decision ReportA study on Celebrity endorsement of Consumer buying decision Report
A study on Celebrity endorsement of Consumer buying decision Report
 
Money banking assignment 1.2
Money  banking assignment 1.2Money  banking assignment 1.2
Money banking assignment 1.2
 
Money banking assignment
Money  banking assignment Money  banking assignment
Money banking assignment
 
Business Plan: Photography Business (slides)
Business Plan: Photography Business (slides)Business Plan: Photography Business (slides)
Business Plan: Photography Business (slides)
 
Business Plan: Photography Business
Business Plan: Photography Business Business Plan: Photography Business
Business Plan: Photography Business
 
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page )
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page ) Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page )
Ratio Analysis of Apex Adelchi Footwear Ltd (Top Page )
 
Ratio Analysis of Apex Adelchi Footwear Ltd
Ratio Analysis of Apex Adelchi Footwear LtdRatio Analysis of Apex Adelchi Footwear Ltd
Ratio Analysis of Apex Adelchi Footwear Ltd
 
Apex Adelchi Footwear Limited Ratio Analysis
Apex Adelchi Footwear Limited Ratio Analysis Apex Adelchi Footwear Limited Ratio Analysis
Apex Adelchi Footwear Limited Ratio Analysis
 
A presentation Tarek masud
A presentation Tarek masudA presentation Tarek masud
A presentation Tarek masud
 
Tale of a frontiersman
Tale of a frontiersman Tale of a frontiersman
Tale of a frontiersman
 
Solar business presentation( plan)
Solar business presentation( plan)Solar business presentation( plan)
Solar business presentation( plan)
 
Social stratification
Social stratificationSocial stratification
Social stratification
 
cisco case solve
cisco case solvecisco case solve
cisco case solve
 
A Study About Pran Products
A Study About Pran  Products A Study About Pran  Products
A Study About Pran Products
 
Case Solve of Falcon computer
Case Solve of Falcon computerCase Solve of Falcon computer
Case Solve of Falcon computer
 
Audit Report's To Share holder
Audit Report's To Share holder Audit Report's To Share holder
Audit Report's To Share holder
 
Managerial Accounting & It;s Overview
Managerial Accounting & It;s OverviewManagerial Accounting & It;s Overview
Managerial Accounting & It;s Overview
 

Último

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Último (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 

Some Research ques & ans ( Assignment)

  • 1. WORLD UNIVERSITY OF BANGLADESH SUBMITTED BY | MOIN SARKER | ID- 2534 | BATCH 42C| MAJOR IN MARKETING ASSIGNMENT MARKETING RESEARCH SUBMITTED TO: Mrinmoy Mitra Lecturer | Department of Business
  • 2. Describe the Q sort methodology? Q Methodology is a research method used in psychology and in social sciences to study people's "subjectivity"—that is, their viewpoint. Q was developed by psychologist William Stephenson. It has been used both in clinical settings for assessing a patient's progress over time (intra-rater comparison), as well as in research settings to examine how people think about a topic (inter- rater comparisons). The name "Q" comes from the form of factor analysis that is used to analyze the data. Normal factor analysis, called "R method," involves finding correlations between variables (say, height and age) across a sample of subjects. Q, on the other hand, looks for correlations between subjects across a sample of variables. Q factor analysis reduces the many individual viewpoints of the subjects down to a few "factors," which are claimed to represent shared ways of thinking. It is sometimes said that Q factor analysis is R factor analysis with the data table turned sideways. While helpful as a heuristic for understanding Q, this explanation may be misleading, as most Q methodologists argue that for mathematical reasons no one data matrix would be suitable for analysis with both Q and R. The data for Q factor analysis come from a series of "Q sorts" performed by one or more subjects. A Q sort is a ranking of variables—typically presented as statements printed on small cards—according to some "condition of instruction." For example, in a Q study of people's views of a celebrity, a subject might be given statements like "He is a deeply religious man" and "He is a liar," and asked to sort them from "most like how I think about this celebrity" to "least like how I think about this celebrity." The use of ranking, rather than asking subjects to rate their agreement with statements individually, is meant to capture the idea that people think about ideas in relation to other ideas, rather than in isolation. The sample of statements for a Q sort is drawn from and claimed to be representative of a "concourse"—the sum of all things people say or think about the issue being investigated. Commonly Q methodologists use a structured sampling approach in order to try and represent the full breadth of the concourse. One salient difference between Q and other social science research methodologies, such as surveys, is that it typically uses many fewer subjects. This can be a strength, as Q is sometimes used with a single subject, and it makes research far less expensive. In such cases, a person will rank the same set of statements under different conditions of instruction. For example, someone might be given a set of statements about personality traits and then asked to rank them according to how well they describe herself, her ideal self, her father, her mother, etc. Working with a single individual is particularly relevant in the study of how an individual's rankings change over time and this was the first use of Q-methodology. As Q-methodology works with a small non- representative sample, conclusions are limited to those who participated in the study.
  • 3. Short note Measurement: Measurement is the process observing and recording the observations that are collected as part of a research effort. There are two major issues that will be considered here. First, you have to understand the fundamental ideas involved in measuring. Here we consider two of major measurement concepts. In Levels of Measurement, I explain the meaning of the four major levels of measurement: nominal, ordinal, interval and ratio. Then we move on to the reliability of measurement, including consideration of true score theory and a variety of reliability estimators. Second, you have to understand the different types of measures that you might use in social research. We consider four broad categories of measurements. Survey research includes the design and implementation of interviews and questionnaires. Scaling involves consideration of the major methods of developing and implementing a scale. Qualitative research provides an overview of the broad range of non-numerical measurement approaches. And an unobtrusive measure presents a variety of measurement methods that don't intrude on or interfere with the context of the research. Scaling: Scaling is the branch of measurement that involves the construction of an instrument that associates qualitative constructs with quantitative metric units. Scaling evolved out of efforts in psychology and education to measure "unmeasurable" constructs like authoritarianism and self esteem. In many ways, scaling remains one of the most arcane and misunderstood aspects of social research measurement. And, it attempts to do one of the most difficult of research tasks -- measure abstract concepts. Nominal Scale: Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales could simply be called “labels.” Here are some examples, below. Notice that all of these scales are mutually exclusive (no overlap) and none of them have any numerical significance. A good way to remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind of like “names” or labels.
  • 4. Illustration of primary scales of measurement What is likert scale? As in all scaling methods, the first step is to define what it is you are trying to measure. Because this is a one-dimensional scaling method, it is assumed that the concept you want to measure is one-dimensional in nature. You might operationalize the definition as an instruction to the people who are going to create or generate the initial set of candidate items for your scale. Next, you have to create the set of potential scale items. These should be items that can be rated on a 1-to-5 or 1-to-7 Disagree-Agree response scale. Sometimes you can create the items by yourself based on your intimate understanding of the subject matter. But, more often than not, it's helpful to engage a number of people in the item creation step. For instance, you might use some form of brainstorming to create the items. It's desirable to have as large a set of potential items as possible at this stage, about 80-100 would be best. Rating the Items. The next step is to have a group of judges rate the items. Usually you would use a 1-to-5 rating scale where:
  • 5. = strongly unfavorable to the concept = somewhat unfavorable to the concept = undecided = somewhat favorable to the concept = strongly favorable to the concept Notice that, as in other scaling methods, the judges are not telling you what they believe -- they are judging how favorable each item is with respect to the construct of interest. Selecting the Items. The next step is to compute the intercorrelations between all pairs of items, based on the ratings of the judges. In making judgments about which items to retain for the final scale there are several analyses you can do: Throw out any items that have a low correlation with the total (summed) score across all items In most statistics packages it is relatively easy to compute this type of Item-Total correlation. First, you create a new variable which is the sum of all of the individual items for each respondent. Then, you include this variable in the correlation matrix computation (if you include it as the last variable in the list, the resulting Item-Total correlations will all be the last line of the correlation matrix and will be easy to spot). How low should the correlation be for you to throw out the item? There is no fixed rule here -- you might eliminate all items with a correlation with the total score less that .6, for example. For each item, get the average rating for the top quarter of judges and the bottom quarter. Then, do a t-test of the differences between the mean value for the item for the top and bottom quarter judges. Higher t-values mean that there is a greater difference between the highest and lowest judges. In more practical terms, items with higher t-values are better discriminators, so you want to keep these items. In the end, you will have to use your judgment about which items are most sensibly retained. You want a relatively small number of items on your final scale (e.g., 10-15) and you want them to have high Item-Total correlations and high discrimination (e.g., high t-values). Administering the Scale. You're now ready to use your Likert scale. Each respondent is asked to rate each item on some response scale. For instance, they could rate each item on a 1-to-5 response scale where: = strongly disagree = disagree = undecided = agree
  • 6. = strongly agree There are variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered scales have a middle value is often labeled Neutral or Undecided. It is also possible to use a forced-choice response scale with an even number of responses and no middle neutral or undecided choice. In this situation, the respondent is forced to decide whether they lean more towards the agree or disagree end of the scale for each item. The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why this is sometimes called a "summated" scale). On some scales, you will have items that are reversed in meaning from the overall direction of the scale. These are called reversal items. You will need to reverse the response value for each of these items before summing for the total. That is, if the respondent gave a 1, you make it a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2; and, 5 = 1. Example: The Employment Self Esteem Scale Here's an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a person has on the job. Notice that this instrument has no center or neutral point -- the respondent has to declare whether he/she is in agreement or disagreement with the item. INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following statements by placing a check mark in the appropriate box. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 1. I feel good about my work on the job. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 2. On the whole, I get along well with others at work. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 3. I am proud of my ability to cope with difficulties at work. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 4. When I feel uncomfortable at work, I know how to handle it. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 5. I can tell that other people at work are glad to have me there. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 6. I know I'll be able to cope with work for as long as I want. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 7. I am proud of my relationship with my supervisor at work. 8. I am confident that I can handle my job
  • 7. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree without constant assistance. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 9. I feel like I make a useful contribution at work. StronglyDisagree SomewhatDisagree SomewhatAgree Strongly Agree 10. I can tell that my coworkers respect me. What is reliability? When we examine a construct in a study, we choose one of a number of possible ways to measure that construct [see the section on Constructs in quantitative research, if you are unsure what constructs are, or the difference between constructs and variables]. For example, we may choose to use questionnaire items, interview questions, and so forth. These questionnaire items or interview questions are part of the measurement procedure. This measurement procedure should provide an accurate representation of the construct it is measuring if it is to be considered valid. For example, if we want to measure the construct, intelligence, we need to have a measurement procedure that accurately measures a person's intelligence. Since there are many ways of thinking about intelligence (e.g., IQ, emotional intelligence, etc.), this can make it difficult to come up with a measurement procedure that has strong validity [see the article: Construct validity]. In quantitative research, the measurement procedure consists of variables; whether a single variable or a number of variables that may make up a construct [see the section on Constructs in quantitative research]. When we think about the reliability of these variables, we want to know how stable or constant they are. This assumption, that the variable you are measuring is stable or constant, is central to the concept of reliability. In principal, a measurement procedure that is stable or constant should produce the same (or nearly the same) results if the same individuals and conditions are used. What is validity? Validity is the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin valid us, meaning strong. The validity of a measurement tool (for example, a test in education) is considered to be the degree to which the tool measures what it claims to measure; in this case, the validity is an equivalent to accuracy. In psychometrics, validity has a particular application known as test validity: "the degree to which evidence and theory support the interpretations of test scores" ("as entailed by proposed uses of tests"). It is generally accepted that the concept of scientific validity addresses the nature of reality and as such is an epistemological and philosophical issue as well as a question of measurement. The use of the term in logic is narrower, relating to the truth of inferences made from premises.
  • 8. Validity is important because it can help determine what types of tests to use, and help to make sure researchers are using methods that are not only ethical, and cost-effective, but also a method that truly measures the idea or construct in question. Relationship between reliability and validity? Reliability and validity are important concepts within psychometrics. Reliability is generally thought to be necessary for validity, but it does not guarantee validity. Reliability and validity are, conceptually, quite distinct and there need not be any necessary relationship between the two. Be wary of statements which imply that a valid test or measure has to be reliable. Where the measurement emphasis is on relatively stable and enduring characteristics of people (e.g. their creativity), a measure should be consistent over time (reliable). It also ought to distinguish between inventors and the rest of us if it is a valid measure of creativity. A measure of a characteristic which varies quite rapidly over time will not be reliable over time - if it is then we might doubt its validity. For example, a valid measure of suicide intention may not be particularly stable (reliable) over time though good at identifying those at risk of suicide. Validity is often expressed as a correlation between the measure and some criterion. This validity coefficient will be limited or attenuated by the reliability of the test or measure. Thus, the maximum correlation of the test of measure with any other variable has an upper limit determined by the internal reliability. Within classical test theory, predictive or concurrent validity (correlation between the predictor and the predicted) cannot exceed the square root of the correlation between two versions of the same measure — that is, reliability limits validity. With this in mind, it can be helpful to conceptualize the following four basic scenarios for the relation between reliability and validity: Reliable (consistent) and valid (measures what it's meant to measure, i.e., a stable construct) Reliable (consistent) and not valid (measures something consistently, but it doesn't measure what its meant to measure) Unreliable (not consistent) and not valid (inconsistent measure which doesn't measure what its meant to measure) Unreliable (not consistent) and valid (measures what it’s meant to measure, i.e., an unstable construct) It is important to distinguish between internal reliability and test-retest reliability. A measure of a fluctuating phenomenon such as suicide intention may be valid but have low test-retest reliability (depending on how much the phenomenon fluctuates and how far apart the test and retest is), but the measure should exhibit good internal consistency on each occasion.
  • 9. Systematic Errors Systematic errors in experimental observations usually come from the measuring instruments. They may occur because:  there is something wrong with the instrument or its data handling system, or  because the instrument is wrongly used by the experimenter. Two types of systematic error can occur with instruments having a linear response: 1. Offset or zero setting error in which the instrument does not read zero when the quantity to be measured is zero. 2. Multiplier or scale factor error in which the instrument consistently reads changes in the quantity to be measured greater or less than the actual changes. These errors are shown in Fig. 1. Systematic errors also occur with non-linear instruments when the calibration of the instrument is not known correctly. Fig. 1. Systematic errors in a linear instrument (full line). Broken line shows response of an ideal instrument without error. Examples of systematic errors caused by the wrong use of instruments are:  errors in measurements of temperature due to poor thermal contact between the thermometer and the substance whose temperature is to be found,  Errors in measurements of solar radiation because trees or buildings shade the radiometer. The accuracy of a measurement is how close the measurement is to the true value of the quantity being measured. The accuracy of measurements is often reduced by systematic errors, which are difficult to detect even for experienced research workers. Random Errors: Random errors in experimental measurements are caused by unknown and unpredictable changes in the experiment. These changes may occur in the measuring instruments or in the environmental conditions.
  • 10. Examples of causes of random errors are:  electronic noise in the circuit of an electrical instrument,  irregular changes in the heat loss rate from a solar collector due to changes in the wind. Random errors often have a Gaussian normal distribution (see Fig. 2). In such cases statistical methods may be used to analyze the data. The mean m of a number of measurements of the same quantity is the best estimate of that quantity, and the standard deviation s of the measurements shows the accuracy of the estimate. The standard error of the estimate m is s/sqrt(n), where n is the number of measurements. Fig. 2. The Gaussian normal distribution. m = mean of measurements. s = standard deviation of measurements. 68% of the measurements lie in the interval m - s < x < m + s; 95% lie within m - 2s < x < m + 2s; and 99.7% lie within m - 3s < x < m + 3s. The precision of a measurement is how close a number of measurements of the same quantity agree with each other. The precision is limited by the random errors. It may usually be determined by repeating the measurements. What is double barreled question? A double-barreled question (sometimes, double-direct question) is an informal fallacy. It is committed when someone asks a question that touches upon more than one issue, yet allows only for one answer. This may result in inaccuracies in the attitudes being measured for the question, as the respondent can answer only one of the two questions, and cannot indicate which one is being answered. Many double-barreled questions can be detected by the existence of the grammatical conjunction "and" in them. This is not a foolproof test, as the word "and" can exist in properly constructed questions. A question asking about three items is known as "trible (triple, treble)-barreled." In legal proceedings, a double-barreled question is called a compound question
  • 11. What are the Advantages and disadvantages of unstructured question? The advantages of an unstructured approach are said to be: The respondent can answer in any way that they wish; therefore they can lead the interview; they are the better means of finding out ‘true’ opinion and identifying how strongly attitudes are held e.g. Ann Oakley’s 178 interviews; for positivists the statistical patterns revealed can be used to develop new theories; and questionnaires can be devised to test existing theories. This can be known as triangulation. Feminist methodology is less male dominated/exploitative than conventional research methods. The disadvantages of an unstructured approach are said to be: The coding of open questionnaires data distorts the distinct answers given by individuals; they require more thought and time on the part of respondent. This reduces the number of questions that can realistically be asked; the cost will be fairly expensive due to coding; it is more difficult to even out opinion across a sample of questionnaires using open questions; respondents may answer in unhelpful ways as they may not take it seriously like the census; respondents may have trouble expressing themselves accurately and the Halo effect. The ‘Halo Effect’ refers to a common bias, in the impression people form of others, by which attributes are often generalized. Implicitly nice people, for example, are assumed to have all nice attributes. This can lead to misleading judgments: for example, clever people may falsely be assumed to be knowledgeable about everything. This can be a disadvantage when using open questionnaires due to making assumptions. What is the importance of questioners? Questionnaires can be thought of as a kind of written interview. They can be carried out face to face, by telephone or post. Questions have been used to research type A personality (e.g. Friedman & Rosenman, 1974), and also to assess life events which may cause stress (Holmes & Rahe, 1967). Questionnaires provide a relatively cheap, quick and efficient way of obtaining large amounts of information from a large sample of people. Data can be collected relatively quickly because the researcher would not need to be present when the questionnaires were completed. This is useful for large populations when interviews would be impractical. However, a problem with questionnaire is that respondents may lie due to social desirability. Most people want to present a positive image of them and so may lie or bend the truth to look good, e.g. pupils would exaggerate revision duration. Also the language of a questionnaire should be appropriate to the vocabulary of the group of people being studied. For example, the researcher must change the language of questions to match the social background of respondents' age / educational level / social class / ethnicity etc.
  • 12. Questionnaires can be an effective means of measuring the behavior, attitudes, preferences, opinions and intentions of relatively large numbers of subjects more cheaply and quickly than other methods. An important distinction is between open-ended and closed questions. Often a questionnaire uses both open and closed questions to collect data. This is beneficial as it means both quantitative and qualitative data can be obtained. Why is a sample preferred to a census? A census is the procedure of systematically acquiring and recording information about all d members of a given population and a sample is a group from d population a census is more thorough and gives accurate information about a population while being more expensive and consuming time comsuing rather than a sample. sample could be more accurate than a (attempted) census if the fact of the exercise being a census increases the bias from non-sampling error. This could come about, for example, if the census generates an adverse political campaign advocating non-response (something less likely to happen to a sample). Unless this happens, I can't see why a sample would be expected to have less nonsampling error than a census; and by definition it will have more sampling error. So apart from quite unusual circumstances I would say a census is going to be more accurate than a sample. Consider a common source of nonsampling error - systematic non-response eg by a particular socio demographic group. If people from group X are likely to refuse the census, they are just as likely to refuse the sample. Even with post stratification sampling to weight up the responses of those people from group X who you do persuade to answer your questions, you still have a problem because those might be the very segment of X that are pro-surveys. There is no real way around this problem other than to be as careful as possible with your design of instrument and delivery method. In passing, this does draw attention to one possible issue that could make an attempted census less accurate than a sample. Samples routinely have post stratification weighting to population, which mitigates bias problems from issues such as that in my paragraph above. An attempted census that doesn't get 100% return is just a large sample, and should in principle be subject to the same processing; but because it is seen as a "census" (rather than an attempted census) this may be neglected. So that census might be less accurate than the appropriately weighted sample. But in this case the problem is the analytical processing technique (or omission of), not something intrinsic to it being an attempted census. Efficient is another matter - as Michelle says, a well conducted sample will be more efficient than a census, and it may well have sufficient accuracy for practical purposes.
  • 13. How the target population should be defined? Target population refers to the ENTIRE group of individuals or objects to which researchers are interested in generalizing the conclusions. The target population usually has varying characteristics and it is also known as the theoretical population. All research questions address issues that are of great relevance to important groups of individuals known as a research population. A research population is generally a large collection of individuals or objects that is the main focus of a scientific query. It is for the benefit of the population that researches are done. However, due to the large sizes of populations, researchers often cannot test every individual in the population because it is too expensive and time-consuming. This is the reason why researchers rely on sampling techniques. A research population is also known as a well-defined collection of individuals or objects known to have similar characteristics. All individuals or objects within a certain population usually have a common, binding characteristic or trait. Usually, the description of the population and the common binding characteristic of its members are the same. "Government officials" is a well-defined group of individuals which can be considered as a population and all the members of this population are indeed officials of the government. How do probability sampling techniques differ from non probability sampling techniques? The difference between nonprobability and probability sampling is that nonprobability sampling does not involve random selection and probability sampling does. Does that mean that nonprobability samples aren't representative of the population? Not necessarily. But it does mean that nonprobability samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented the population well. We are able to estimate confidence intervals for the statistic. With nonprobability samples, we may or may not represent the population well, and it will often be hard for us to know how well we've done so. In general, researchers prefer probabilistic or random sampling methods over no probabilistic ones, and consider them to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of no probabilistic alternatives. We can divide nonprobability sampling methods into two broad types: accidental or purposive. Most sampling methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind. The most important distinctions among these types of sampling methods are the ones between the different types of purposive sampling approaches.
  • 14. What is the relationship between quota sampling and judgmental sampling? Judgmental sampling is a non-probability sampling technique where the researcher selects units to be sampled based on their knowledge and professional judgment. This type of sampling technique is also known as purposive sampling and authoritative sampling. On the other hand, Quota sampling is a non-probability sampling technique wherein the assembled sample has the same proportions of individuals as the entire population with respect to known characteristics, traits or focused phenomenon. In addition to this, the researcher must make sure that the composition of the final sample to be used in the study meets the research's quota criteria. Judgmental sampling design is usually used when a limited number of individuals possess the trait of interest. It is the only viable sampling technique in obtaining information from a very specific group of people. It is also possible to use judgmental sampling if the researcher knows a reliable professional or authority that he thinks is capable of assembling a representative sample. In a study wherein the researcher likes to compare the academic performance of the different high school class levels, its relationship with gender and socioeconomic status, the researcher first identifies the subgroups. Usually, the subgroups are the characteristics or variables of the study. The researcher divides the entire population into class levels, intersected with gender and socioeconomic status. Then, he takes note of the proportions of these subgroups in the entire population and then samples each subgroup accordingly. What is the step in sampling design process? The following are some of the important steps that one needs to keep in mind when developing a sample design:- Defining the universe or population of interest is the first step in any sample design. The accuracy of the results in any study depends on how clearly the universe or population of interest is defined. The universe can be finite or infinite, depending on the number of items it contains. Defining the sampling unit within the population of interest is the second step in the sample design process. The sampling unit can be anything that exists within the population of interest. For example, sampling unit may be a geographical unit, or a construction unit or it may be an individual unit. Preparing the list of all the items within the population of interest is the next step in the sample design process. It is from this list, which is also called as source list or sampling frame, that we draw our sample. It is important to note that our sampling frame should be highly representative of the population of interest. Determination of sample size is the next step to follow. This is the most critical stage of the sample design process because the sample size should not be excessively large nor it should be too small. It is desired that the sample size should be optimum and it should be representative of
  • 15. the population and should give reliable results. Population variance, population size, parameters of interest, and budgetary constraints are some of the factors that impact the sample size. Deciding about the technique of sampling is the next step in sample design. There are many sampling techniques out of which the researchers has to choose the one which gives lowest sampling error, given the sample size and budgetary constraints. Describes classification of sampling design techniques? A probability sample is a sample in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. The combination of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their probability of selection. Nonprobability sampling is any sampling method where some elements of the population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or where the probability of selection can't be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population. In a simple random sample (SRS) of a given size, all such subsets of the frame are given an equal probability. Furthermore, any given pair of elements has the same chance of selection as any other such pair (and similarly for triples, and so on). This minimises bias and simplifies analysis of results. In particular, the variance between individual results within the sample is a good indicator of variance in the overall population, which makes it relatively easy to estimate the accuracy of results. SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn't reflect the makeup of the population. For instance, a simple random sample of ten people from a given country will on average produce five men and five women, but any given trial is likely to overrepresent one sex and underrepresent the other. Systematic and stratified techniques attempt to overcome this problem by "using information about the population" to choose a more "representative" sample. Systematic sampling (also known as interval sampling) relies on arranging the study population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list. A simple example would be to select
  • 16. every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10'). As long as the starting point is randomized, systematic sampling is a type of probability sampling. It is easy to implement and the stratification induced can make it efficient, if the variable by which the list is ordered is correlated with the variable of interest. 'Every 10th' sampling is especially useful for efficient sampling from databases. Where the population embraces a number of distinct categories, the frame can be organized by these categories into separate "strata." Each stratum is then sampled as an independent sub- population, out of which individual elements can be randomly selected.[2] There are several potential benefits to stratified sampling. First, dividing the population into distinct, independent strata can enable researchers to draw inferences about specific subgroups that may be lost in a more generalized random sample. Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in time – although this is rarely taken into account in the analysis.) For instance, if surveying households within a city, we might choose to select 100 city blocks and then interview every household within the selected blocks. In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion. For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60. It is this second step which makes the technique one of non-probability sampling. In quota sampling the selection of the sample is non-random. For example interviewers might be tempted to interview those who look most helpful. The problem is that these samples may be biased because not everyone gets a chance of selection. This random element is its greatest weakness and quota versus probability has been a matter of controversy for several years. Snowball sampling involves finding a small group of initial respondents and using them to recruit more respondents. It is particularly useful in cases where the population is hidden or difficult to enumerate. Pre coding and Post Coding? Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position (field) and data record it will occupy. For example, the sex of respondents may be coded as 1 for females and 2 for males. A field represents a single item of data, such as the sex of the respondent. A record consists of related fields, such as sex, marital status, age, household size, occupation, and so on. All the
  • 17. demographic and personality characteristics of a respondent may be contained in a single record. The respondent code and the record number should appear on each record in the data. If possible, standard codes should be used for missing data. For example, a code of 9 (or –9) could be used for a single-digit variable (responses coded on a scale of 1 to 7), 99 for a double-digit variable (responses coded on a scale of 1 to 11), and so forth. The missing value codes should be distinct from the codes assigned to the legitimate responses. If the questionnaire contains only structured questions or very few unstructured questions, it is pre coded. This means that codes are assigned before field work is conducted. If the questionnaire contains unstructured questions, codes are assigned after the questionnaires have been returned from the field (post coding). We provide some guidelines on the coding of structured questions followed by coding of unstructured question. What options are available for the treatment of missing data? There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios. Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’; although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will tend to be unbiased, although based on a smaller sample size than the original data set. Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are 'not missing at random', and attrition and exclusions of individuals within studies often do as well. The principal options for dealing with missing data are. 1. Analyzing only the available data (i.e. ignoring the missing data); 2. Imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were
  • 18. poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis); 3. Imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); 4. Using statistical models to allow for missing data, making assumptions about their relationships with the available data. Option 1 may be appropriate when data can be assumed to be missing at random. Options 2 to 4 are attempts to address data not missing at random. Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician. Four general recommendations for dealing with missing data in Cochrane reviews are as follows. Whenever possible, contact the original investigators to request missing data. Make explicit the assumptions of any methods used to cope with missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome. Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made. Address the potential impact of missing data on the findings of the review in the Discussion section. What is weighting? Reason behind for using weighting? Weighting is the process of assigning numbers to certain groups of respondents in a study so that their numbers reflect the actual proportions within the real world. For instance, let’s say a study were undertaken about local businesses in an area known to contain 20% retail businesses, but only 10% of the respondents in the study were retail businesses. This would set of results that did not accurately reflect the real world. Therefore, the results might be weighted to reflect the higher proportion of retail businesses in the area and as such it would more accurately reflect reality. Before considering various situations where data weighting is appropriate please consider a brief review of the weighting concept. If we assume our data set is representative, then analysis proceeds under the concept that the respondents in the sample represent the members of the population in proper proportion (for example, the percentage of males, females, customers, non- customers, etc. are nearly equivalent in the sample and the population). Having achieved proportional representation in our sample, respondents are grouped according to various
  • 19. characteristics and attitudes and tabulated accordingly; with each respondent counting as one person. If a data set contains specific groups of respondents that are either over-represented or under- represented, then the sample is not indicative of the population and analyzing the data as collected is not appropriate. Instead, the data should be redistributed (or weighted) so that we achieve proportional representation in our sample. Specifically, each data point will carry a weight and rather than each respondent counting equally as one sample member, will thereafter represent either more or less than one sample member when results are tabulated. This process will be illustrated more clearly by example in the following section. The important concept to remember is that the goal of any study is to obtain a representative sample. If that is not achieved naturally, redistribution of the data is required to yield one. One of the most common circumstances where weighting is necessary is when quota sampling is employed. Typically, studies employ quota sampling in order to obtain readable sample sizes for various sub-segments of interest from the population. This ensures adequate margin of error for segment-to- segment comparisons. However, this stratified sampling approach does not typically yield a viable total sample. Should the researcher desire analysis of the total sample as well as segment comparisons, redistribution of the data is required. Let us consider a simple example of a study where stratified sampling is utilized. In Example 1, the weighted data must be used when analyzing results that combine the two quota cells. When tabulating the weighted data each Economy respondent will count as 1.8 persons and each
  • 20. First/Business respondent will count as .2 persons. While this example is very simple, the processes employed are replicated in more complicated weighting schemes. This technique can be applied to multiple cells and across intersections of various quota designations (gender by region, age by race). In almost all instances where quota sampling is utilized weighting is required. Hence, prior to tabulation the researcher needs to consider the appropriate actions. Unintentional Bias Non-response In quota sampling, the researcher intentionally introduces bias into the data by establishing a certain number of interviews regardless of a particular population segment. In this next section we look at unintentional data bias. A common form of this is known as non-response bias. This occurs when particular types of respondents are not reached during the study. Historical data tell us that there are certain respondents that are more difficult to reach overall (younger, affluent) and also based on the type of methodology used (no Internet access, call blocking). There are certain sampling techniques that can be utilized to minimize such bias, though it often still exists. Weighting can be used to help mitigate the effects of non-responsebias. In such instances the researcher compares the distribution of key classification variables in the sample to the actual population distribution. If the distribution in the sample is not correct the data would be weighted using the techniques described in the prior section. One drawback here is that when studying artificial populations such as customers and prospects we often do not have reliable distributions for comparison. Sample Balancing Let us now consider data sets that are obtained from a particular geographic region – these can include studies that are conducted on the national level, statewide or within a particular city or county. For studies of this type there is a large amount of descriptive data available (via census figures) about the population of interest. As such, when samples of these geographies are obtained it is imperative that the researcher consider the sample distribution of respondent demographics. When making comparisons to census data there are any number of characteristics that can be used. It is up to the researcher to determine which are to be used in the weighting scheme. Obviously, any variable used must be included in the survey data. Another consideration is missing values in the survey data. Since these are difficult to account for, variables with a high proportion of missing values (such as income) are often excluded. Once the weighting variables are identified the researcher needs to compute the actual weights. If only two characteristics are to be used in weighting the data the researcher might employ a technique identical to the one portrayed in ourairline study example. Let’s say we want to weight on gender (2 groups) and region (9 census designations). This would yield a total of 18 cells. The process would be the same as described for two groups (in Example 1), where the researcher would determine the desired number of respondents in each cell and weight accordingly.
  • 21. However, with this type of data there are often more than two characteristics of interest – for example gender (2 groups), region (9 groups), age (5 groups) and race (4 groups) – all of which yields 360 individual cells to populate. This poses a number of difficulties. First, while distributions for individual variables might be available, the distribution for each combination might not. Second each individual cell might not have survey respondents populating the cell, making weighting impossible. To combat this, we can employ a technique called sample balancing. In sample balancing the weighting variables are redistributed individually, rather than computed cross variable cells (example males; 18-24; Northeast). The process is iterative with weights being applied on variable #1, then that new data set is weighted on variable #2; then that new data set is weighted on variable #3. This process is repeated again and again in order to achieve distributions on all the weighting variables that are close to the population. Because non-response bias is difficult to measure, the researcher should apply some type of sample balancing on the data regardless of how it looks unweighted. Whether to employ sample balancing or simply compute weights for individual cells is dependent on the population information available, the condition of the survey data and the number of respondents available in each cell. Comparing Two Samples Aside from making sure that data sets are representative, data weighting can also be a useful tool in comparing two samples. This can occur in any number of instances; some of the most common being Test vs. Control studies, Wave-to-Wave studies and studies that mix methodologies. In this section we will examine how data weighting might be utilized. A basic premise of Test vs. Control studies is that there are two (or more) comparable populations in every respect but one – which is exposure to some type of stimuli. The goal of the study is to see if such exposure changes attitudes and/or behaviors. Because of this, it is imperative that the researcher is confident that differences seen between test and control groups be attributable to the stimuli and not inherent differences in the group composition. As such, before making comparisons across test and control cells, the researcher needs to compare the data sets on key demographic and behaviors that they would expect to be similar.
  • 22. The researcher can redistribute the data from one of the data sets to match the other. This would be analogous to redistributing sample data to match the population. Here, we are less concerned with the distribution of age in the population than we are in having two comparable data sets. In this instance the researcher might take the Control Group distribution and weight the Test data to wind up with a comparable 20% of respondents age 65+. This would then allow comparisons of key variables across the cells. This same process applies to wave studies. Again, the researcher is looking to compare groups, in this instance across time. The underlying assumption is that in each wave comparable, representative samples of some population are being drawn. Of course, it may occur that in one wave a bias is introduced. As such, before analyzing wave-to-wave data the researcher should compare key demographic and behavioral variables to ensure there is no change in the composition of the data sets. Should there be any changes, the researcher should consider weighting the data prior to analysis in order to ensure that any differences are real and not due to a sampling bias. Emphasizing Key Characteristics A slightly different take on weighting occurs when we want to emphasize a specific characteristic of the respondent. This technique is often used when there are certain respondents we want to count more heavily due to their importance. For example, a client may want to overemphasize the opinions and behaviors of customers who spend the most money with them. The approach in this scenario remains basically the same as has been described previously. The analysis moves away from the idea that each respondent counts as one and utilizes redistribution of the data. In the case where a client wanted to look at survey data based upon dollars spent, the researcher would assign each respondent a weight based on this variable. So if a customer spends $100,000 with the client they would get a weight of 100 while someone who spends $5,000 would get a weight of 5. When the data are
  • 23. weighted and then analyzed, responses are distributed so that instead of looking at opinions based upon people, the data would be showing responses based upon dollars spent. What are the dummy variables? Why such variable created? A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels. We will illustrate this with an example: Let’s say you want to find out whether the location of a house in the East, Southeast or Northwest side of a development and whether the house was built before or after 1990 affects its sale price. The image below shows a portion of the Sale Price dataset: Sale Price is the numerical response variable. The dummy variable Y1990 represents the binary Independent variable ‘Before/After 1990’. Thus, it takes two values: ‘1’ if a house was built after 1990 and ‘0’ if it was built before 1990. Thus, a single dummy variable is needed to represent a variable with two levels. Notice, there are only two dummy variables left, East (E) and South East (SE). Together, they represent the Location variable with three levels (E, SE, NW). They’re constructed so that E = ‘1’ if the house falls on the East side and ‘0’ otherwise, and SE = ‘1’ if the house falls on the Southeast side and ‘0’ otherwise What happened to the third location, NW? Well, it turns out we don’t need a third dummy variable to represent it. Setting both E and SE to ‘0’ indicates a house on the NW side. Notice that this coding only works if the three levels are mutually exclusive (so not overlap) and exhaustive (no other levels exist for this variable). The regression of Sale Price on these dummy variables yields the following model: Sale Price = 258 + 33.9*Y1990 - 10.7*E + 21*SE The constant intercept value 258 indicates that houses in this neighborhood start at $258 K irrespective of location and year built. The coefficient of Y1990 indicates that other things being equal, houses in this Neighborhood built after 1990 command a $33.9 K premium over those built before 1990. Similarly, houses on the East side cost $10.7 K lower (it has a negative sign) than houses on the NW side and houses on the SE side cost $21 K higher than houses on the NW side. Thus, NW serves as the baseline or reference level for E and SE. We can estimate the sale price for a house built before 1990 and located on the East side from this equation By substituting Y1990 = 0, E = 1 and SE = 0, giving Sale Price = $247.3 K.
  • 24. Things to keep in mind about dummy variables Dummy variables assign the numbers ‘0’ and ‘1’ to indicate membership in any mutually exclusive and exhaustive category. 1. The number of dummy variables necessary to represent a single attribute variable is equal to the Number of levels (categories) in that variable minus one. 2. For a given attribute variable, none of the dummy variables constructed can be redundant. That is, one dummy variable cannot be a constant multiple or a simple linear relation of another. 3. The interaction of two attribute variables (e.g. Gender and Marital Status) is represented by a third Dummy variable which is simply the product of the two individual dummy variables. What Is Data Preparation? Data preparation process? Data Preparation is the process of collecting, cleaning, and consolidating data into one file or data table for use in analysis. The process of preparing data generally entails correcting any errors (typically from human and/or machine input), filling in nulls and incomplete data, and merging data from several sources or data formats. .Data preparation is most often used when: Handling messy, inconsistent, or un-standardized data Trying to combine data from multiple sources Reporting on data that was entered manually Dealing with data that was scraped from an unstructured source such as PDF documents Data preparation can be used to harmonize, enrich, or to even standardize data in scenarios where multiple values are used in a data set to represent the same value. An example of this is seen with U.S. states – where multiple values are commonly used to represent the same state. A state like California could be represented by ‘CA’, ‘Cal.’, ‘Cal’ or ‘California’ to name a few. A data preparation tool could be used in this scenario to identify an incorrect number of unique values (in the case of U.S. states, a unique count greater than 50 would raise a flag, as there are only 50 states in the U.S.). These values would then need to be standardized to use only an abbreviation or only full spelling in every row. The process of data preparation typically involves: 1. Data analysis – The data is audited for errors and anomalies to be corrected. For large datasets, data preparation applications prove helpful in producing metadata and uncovering problems.
  • 25. 2. Creating an Intuitive workflow – A workflow consisting of a sequence of data prep operations for addressing the data errors is then formulated. 3. Validation– The correctness of the workflow is next evaluated against a representative sample of the dataset. This process may call for adjustments to the workflow as previously undetected errors are found. 4. Transformation – Once convinced of the effectiveness of the workflow, transformation may now be carried out, and the actual data prep process takes place. 5. Backflow of cleaned data – Finally, steps must also be taken for the clean data to replace the original dirty data sources. What is skewed distribution? What does it mean? In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined. A distribution is said to be skewed when the data points cluster more toward one side of the scale than the other, creating a curve that is not symmetrical. In other words, the right and the left side of the distribution are shaped differently from each other. There are two types of skewed distributions. A distribution is positively skewed if the scores fall toward the lower side of the scale and there are very few higher scores. Positively skewed data is also referred to as skewed to the right because that is the direction of the 'long tail end' of the chart. Let's create a chart using the yearly income data that we collected from the MBA graduates. You can see that most of the graduates reported annual income between $31,000 and $70,000. You can see that there are very few graduates that make more than $70,000. The yearly income for MBA graduates is positively skewed, and the 'long tail end' of the chart points to the right. A distribution is negatively skewed if the scores fall toward the higher side of the scale and there are very few low scores. Let's take a look at the chart of the number of applications each graduate completed before they found their current job. We can see that most of the graduates completed between 9 and 13 applications. Only 56 out of the 400 graduates completed less than 9 applications. The number of applications completed for MBA graduates is negatively skewed, and the 'long tail end' points to the left. Negatively skewed data is also referred to as 'skewed to the left' because that is the direction of the 'long tail end.' What is the major different between cross tabulation and frequency distribution? Frequency Distribution A good first step in your analysis is to conveniently summarize the data by counting the responses for each level of a given variable. These counts, or frequencies, are called the frequency distribution and are commonly accompanied by the percentages and cumulative percentages as well.
  • 26. A frequency distribution can quickly reveal: The number of no responses or missing values Outliers and extreme values The central tendency, variability and shape of the distribution. Suppose a pet adoption and rescue agency wants to find out whether dogs or cats are more popular in a certain location. To answer this question, we survey a random sample of 100 local pet owners to find out if dogs are more popular than cats, or vice versa. Cross Tabulation A frequency distribution can tell you about a single variable, but it does not provide information about how two or more variables relate to one another. To understand the association between multiple variables, we can use cross tabulation. Let’s say we want to see if a gender preference exists for dogs versus cats. Are men more likely to want a dog than a cat compared to women, or vice versa? To summarize data from both variables at the same time, we need to construct a cross-tabulation table, also known as a contingency table. This table lets us evaluate the counts and percents, just like a frequency distribution. But while a frequency distribution provides information for each level of one variable, cross tabulation shows results for all level combinations of both variables. What is a Null Hypothesis? A null hypothesis is a type of hypothesis used in statistics that proposes that no statistical significance exists in a set of given observations. The null hypothesis attempts to show that no variation exists between variables, or that a single variable is no different than zero. It is presumed to be true until statistical evidence nullifies it for an alternative hypothesis. The null hypothesis assumes that any kind of difference or significance you see in a set of data is due to chance. For example, Chuck sees that his investment strategy produces higher average returns than simply buying and holding a stock. The null hypothesis claims that there is no difference between the two average returns, and Chuck has to believe this until he proves otherwise. Refuting the null hypothesis would require showing statistical significance, which can be found using a variety of tests. If Chuck conducts one of these tests and proves that the difference between his returns and the buy-and-hold returns is significant, he can then refute the null hypothesis. What is an alternative Hypothesis? An alternative hypothesis states that there is statistical significance between two variables. In the earlier example, the two variables are Mentos and Diet Coke. The alternative hypothesis is the hypothesis that the researcher is trying to prove. In the Mentos and Diet Coke experiment, Arnold was trying to prove that the Diet Coke would explode if he put Mentos in the bottle.
  • 27. Therefore, he proved his alternative hypothesis was correct. If we continue with example, the alternative hypothesis would be that there IS indeed a statistically significant relationship between Mentos and Diet Coke. Arnold could write it as: If I put half a pack of Mentos into a 2- Liter Diet Coke bottle, there will be a big reaction/explosion. What is T test? A T-test is a statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size. For example, a t- test could be used to compare the average floor routine score of the U.S. women's Olympic gymnastics team to the average floor routine score of China's women's team. The test statistic in the t-test is known as the t-statistic. The t-test looks at the t-statistic, t-distribution and degrees of freedom to determine a p value (probability) that can be used to determine whether the population means differ. The t-test is one of a number of hypothesis tests. To compare three or more variables, statisticians use an analysis of variance. If the sample size is large, they use a z- test. Other hypothesis tests include the chi-square test and f-test. What are the concept and condition for causality? Statistics and economics usually employ pre-existing data or experimental data to infer causality by regression methods. The body of statistical techniques involves substantial use of regression analysis. Typically a linear relationship such as is postulated, in which is the ith observation of the dependent variable (hypothesized to be the caused variable), for j=1,...,k is the ith observation on the jth independent variable (hypothesized to be a causative variable), and is the error term for the ith observation (containing the combined effects of all other causative variables, which must be uncorrelated with the included independent variables). If there is reason to believe that none of the s is caused by y, then estimates of the coefficients are obtained. If the null hypothesis that is rejected, then the alternative hypothesis that and equivalently that causes y cannot be rejected. On the other hand, if the null hypothesis that cannot be rejected, then equivalently the hypothesis of no causal effect of on y cannot be rejected. Here the notion of causality is one of contributory causality as discussed above: If the true value , then a change in will result in a change in y unless some other causative variable(s), either included in the regression or implicit in the error term, change in such a way as to exactly offset its effect; thus a change in is not sufficient to change y. Likewise, a change in is not necessary to change y, because a change in y could be caused by something implicit in the error term (or by some other causative explanatory variable included in the model).
  • 28. The above way of testing for causality requires belief that there is no reverse causation, in which y would cause . This belief can be established in one of several ways. First, the variable may be a non-economic variable: for example, if rainfall amount is hypothesized to affect the futures price y of some agricultural commodity, it is impossible that in fact the futures price affects rainfall amount (provided that cloud seeding is never attempted). Second, the instrumental variables technique may be employed to remove any reverse causation by introducing a role for other variables (instruments) that are known to be unaffected by the dependent variable. Third, the principle that effects cannot precede causes can be invoked, by including on the right side of the regression only variables that precede in time the dependent variable; this principle is invoked, for example, in testing for Granger causality and in its multivariate analog, vector auto regression, both of which control for lagged values of the dependent variable while testing for causal effects of lagged independent variables. Regression analysis controls for other relevant variables by including them as regressors (explanatory variables). This helps to avoid false inferences of causality due to the presence of a third, underlying, variable that influences both the potentially causative variable and the potentially caused variable: its effect on the potentially caused variable is captured by directly including it in the regression, so that effect will not be picked up as an indirect effect through the potentially causative variable of interest.