2. Course Overview
1. What is Evaluation?
2. Measuring Impact
3. Why Randomize?
4. How to Randomize?
5. Sampling and Sample Size
6. Threats and Analysis
7. Cost-Effectiveness Analysis
8. Project from Start to Finish
4. What was the main purpose of the 73rd Amendment of India’s
constitution?
A. To reserve leadership positions
for women (and caste
minorities)
B. To formalize local institutions of
leadership
C. To give women the right to vote
in local elections
0% 0% 0%
To reserve leadership po...
To formalize local institu...
To give women the right ..
5. Theory of Change from Reservations to Final Outcomes
Public goods
show women’s
preferences
Women are
empowered
Pradhan’s
preferences
matter
Democracy
More female imperfect
Pradhans
Reservations
for women
Women have
different
preferences
6. Data Used Covers All Aspects of Theory of Change
Purpose of
Measurement
Sources of
Measurement
Indicators
Reservations
Policy and GP
Priorities
Administrative
Data
• Reservation
• Budgets
• Balance sheets
Preferences of
Women and Men
Transcript from
village meeting
• Who speaks and when (gender)
• Issues raised
Public Investments Village Leader
Interview
• Political experience
• Investments undertaken
Public Investments Village PRA • Village infrastructure + investments
• Perception of public good quality
• Participation of men and women
Household (HH)
Survey
• Declared HH preference
• HH perceptions of quality of public
goods and services
7. Results: Reservations Increase Female Leadership
West Bengal Rajasthan
Of Pradhans: Reserved Unreserved Reserved Unreserved
Total Number 54 107 40 60
Proportion female 100% 6.5% 100% 1.7%
8. Results: Women Raise Different Issues at GP Meetings
West Bengal Rajasthan
Of Pradhans: Women Men Women Men
Drinking water 31% 17% 54% 43%
Road improvement 31% 25% 13% 23%
9. Results: Public Investments Show Women’s Preferences
West Bengal Rajasthan
Issue Investment
Issue
Reserved
Investment
Issue
Reserved
W M W M Investment
Drinking Water # facilities 31% 17% 9.09 54% 43% 2.62
Road
Road
Improvement
Condition (0-1)
31% 25% 0.18 13% 23% -0.08
Irrigation # facilities 4% 20% -0.38 2% 4% -0.02
Education Informal
education
center
6% 12% -0.16 5% 13%
10. Why do You Need Data? Follow Theory of Change
Our theory and hypothesis helps us define the set of outcomes
Need to find indicators that map the outcomes well
Characteristics: Who are the people the program works with, and
what is their environment?
• Sub-groups, covariates, predictors of compliance
Channels: How does the program work, or fail to work?
Outcomes: What is the purpose of the program? Did it achieve that
purpose?
11. Lecture Overview
Theory of Change: What Do You Want to Measure?
Theory of Measurement: What Makes a Good Measure?
Practice of Measurement: Measuring the Immeasurable
Collecting Data
13. The Main Challenge in Measurement: Getting Accuracy and
Precision
More accurate
More precise
14. Terms “Biased” and “Unbiased” Used to Describe Accuracy
More accurate
“Biased” “Unbiased
” On average, we get
the wrong answer
On average, we get
the right answer
15. Terms “Noisy” and “Precise” Used to Describe Precision
More precise
“Noisy”
Random error
causes answer to
bounce around
“Precise”
Measures of the
same thing cluster
together
16. A Noisy and a Precise Measure Can Both Be Biased
More accurate
More precise
“Noisy”
Random error
causes answer to
bounce around
“Precise”
Measures of the
same thing cluster
together
17. Choices in Real Measurement Often Harder
More accurate
More precise
“Noisy” but
“Unbiased”
“Precise” but
“Biased”
Random error
causes answer to
bounce around
Measures of the
same thing cluster
together
18. The Main Challenge in Measurement: Getting Accuracy and
Precision
More accurate
More precise
19. Is this introducing noise or bias?
A. Noise / Random Error
B. Bias
A surveyor doesn´t follow the exact
phrase of the question:
Surveyor- “you feel unsecure in this
neighborhood, right?”
Respondent- “well, sometimes yes
and sometimes, well, no…”
Surveyor- “so, is more of a yes,
right?”
- “mmm…well”
Respondent - “ok, thanks. Next
question.” code 1=yes
0% 0%
Random Error
Systematic Error
20. Accuracy
In theory:
• How well does the indicator map to the outcome? (e.g. IQ test
intelligence)
In practice: Are you getting unbiased answers?
• Respondent bias
• Recall bias
• Anchoring bias
• Social desirability bias (response bias)
• Framing effect / Neutrality
21. Precision and Error
In theory: The measure is precise, but not necessarily accurate
In practice:
• Length, fatigue
• “How much did you spend on broccoli yesterday?” (as a measure of
annual broccoli spending)
• Ambiguous wording (definitions, relationships, recall period, units of
question)
Eg. Definition of terms – ‘household’, ‘income’
• Recall period /units of question
• Type of answer -Open/Closed
• Choice of options for closed questions
Likert (i.e. Strongly disagree, disagree, neither agree nor disagree, . . .)
Rankings
Quantitative scales. Numbers or bins.
22. Random Error
Mistakes in estimation or recall
Question wording
Surveyor training/quality
Data entry
Length, fatigue of survey
How do you generalize from certain questions?
23. Which is worse?
A. Bias (Low Accuracy)
B. Noise (Low Precision)
C. Equally bad
D. Depends
E. Don’t know/can’t say
0% 0% 0% 0% 0%
Noise (Low Precision)
Bias (Low Accuracy)
Equally bad
Depends
Don’t know/can’t say
24. A biased measure will bias our impact estimates
A. True
B. False
C. Depends
D. Don’t know
0% 0% 0% 0%
True
False
Depends
Don’t know
26. What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
27. How much juice did you consume last month?
A. <2 liters
B. 2-5 liters
C. 6-10 liters
D. >11 liters
0% 0% 0% 0%
<2 liters
2-5 liters
6-10 liters
>11 liters
28. 1. Things people do not know very well
What: Anything to estimate, particularly across time. Prone to
recall error and poor estimation
• Examples: distance to health center, profit, consumption,
income, plot size
Strategies:
• Frame the question in a way people are likely to know and
remember.
• Consistency checks – How much did you spend in the last
week on x? How much did you spend in the last 4 weeks on x?
• Multiple measurements of same indicator – How many minutes
does it take to walk to the health center? How many kilometers
away is the health center?
29. How many glasses of juice did you consume yesterday
A. 0
B. 1-3
C. 4-6
D. >6
0% 0% 0% 0%
0
01-Mar
4-6
>6
30. What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
31. How frequently do you yell at your spouse
A. Daily
B. Several times per week
C. Once per week
D. Once per month
E. Never
0% 0% 0% 0% 0%
Daily
Several times per week
Once per week
Once per month
Never
32. 2. Things people don’t want to talk about
What: Anything socially “risky” or something painful
• Examples: sexual activity, alcohol and drug use, domestic
violence, conduct during wartime, mental health
Strategies:
• Don’t start with the hard stuff!
• Consider asking question in third person
• Always ensure comfort and privacy of respondent
33. Asking Sensitive Questions: List Randomization
Randomly ask part of the sample the question with / without a
sensitive option
Response only a count, not the specific options
How many of these statements
are true for you?
This morning I took a
shower.
My nearest bank branch
office is within walking
distance.
I have tea every day.
I use my loan for non-business
expenses.
How many of these statements
are true for you?
This morning I took a
shower.
My nearest bank branch
office is within walking
distance.
I have tea every day.
34. Asking Sensitive Questions: List Randomization
2.8 – 2.1 = 0.7 full TRUE difference (on average)
70% used their loan for non-business purposes.
Average number of true
statements:
2.1
Average number of true
statements:
2.8
36. Asking Sensitive Questions: List Randomization
Some questions are sensitive and people are hesitant to answer
truthfully
List randomization is a way to get the answer on average without
knowing confidential information on any one person
37. What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
37
38. “I feel more empowered now than last year”
A. Strongly disagree
B. Disagree
C. Neither agree nor disagree
D. Agree
E. Strongly agree
0% 0% 0% 0% 0%
Disagree
Neither agree nor disagree
Strongly disagree
Agree
Strongly agree
39. 3. Abstract concepts
What: Potentially the most challenging and interesting type of
difficult-to-measure indicators
Examples: empowerment, bargaining power, social cohesion, risk
aversion
Strategies:
• Three key steps when measuring “abstract concepts”
• Define what you mean by your abstract concept
• Choose the outcome that you want to serve as the measurement of
your concept
• Design a good question to measure that outcome
Often choice between choosing a self-reported measure and a
behavioral measure – both can add value!
40. “I am involved in the decision to send my child to private vs
public school”
A. Strongly disagree
B. Disagree
C. Neither agree nor
disagree
D. Agree
E. Strongly agree
0% 0% 0% 0% 0%
Disagree
Neither agree nor disagree
Strongly disagree
Agree
Strongly agree
41. How “Socially Connected" do You Feel to the Other People in
this Room?
A
B
C
D
E
20% 20% 20% 20% 20%
A
B
C
D
E
You
Everyone
else in
this room
42. How likely are you to take a taxi with a driver you don’t know
after dark?
A. Very unlikely
B. Unlikely
C. Likely
D. Very likely
0% 0% 0% 0%
Very unlikely
Unlikely
Likely
Very likely
43. How likely are you to take a taxi with a driver you don’t know
after dark?
A. Very unlikely
B. Unlikely
C. Likely
D. Very likely
0% 0% 0% 0%
Very unlikely
Unlikely
Likely
Very likely
44. Perceptions and Attitudes
Ask directly
• “How effective is your leader?” (ineffective, somewhat effective,
effective, very…)
Indirect approaches often have better accuracy
• Listen to a Vignette (Male v. Female)
• Revealed preference – voting behavior
• Implicit Association tests
45. Implicit Association Test
People simplify the world for efficiency
• Use thumb rules to draw connections
• May not even be aware themselves
For some important outcomes, may be worth trying to measure
these indirectly
• Implicit association one technique
50. Implicit Association Test
People simplify the world for efficiency
• Use thumb rules to draw connections
• May not even be aware themselves
Actually based on response time, not accuracy
• Are respondents faster to select “Parliament” when associated
with a man than a woman?
51. What is Hard to Measure?
(1) Things people do not know very well
(2) Things people do not want to talk about
(3) Abstract concepts
(4) Things that are not (always) directly observable
(5) Things that are best directly observed
51
52. What proportion race X people are denied jobs due to racial
discrimination?
A. 0%
B. 1-20%
C. 21-40%
D. 41-60%
E. >60%
0% 0% 0% 0% 0%
0%
1-20%
21-40%
41-60%
>60%
53. 4. Things that aren’t Directly Observable
What: You may want to measure outcomes that you can’t ask
directly about or directly observe
• Examples: corruption, fraud, discrimination
Strategies:
• Audit studies, e.g. Rajasthan police experiment to register
cases, Delhi Driver’s Licenses, Delhi doctors
• Multiple sources of data, e.g. inputs of funds vs. outputs
received by recipients, pollution reports by different parties
• Don’t worry – there have already been lots of clever people
before you – so do literature reviews!
54. 5. Things that are Best Directly Observed
What: Behavioral preferences, anything that is more believable
when done than said
Strategies:
• Develop detailed protocols
• Ensure data collection of behavioral measures done under the
same circumstances for all individuals
• Example: how often do you wash your hands?
Strategy: collect the data directly while disrupting
participants’ lives as little as possible
E.g., put movement sensors in soap, measure the weight of
the soap
56. Question: After the last time you used the toilet, did you wash
your hands? (The problem with this indicator is….)
A. Accuracy
B. Precision
C. Both
D. Neither
0% 0% 0% 0%
Accuracy
Precision
Both
Neither
57. Outcome: Gender Bias
Question: How effective are women leaders? (ineffective, somewhat
effective, effective, very…)
A. Accuracy
B. Precision
C. Both
D. Neither
0% 0% 0% 0%
Accuracy
Precision
Both
Neither
58. Examples
Study: Double-Fortified Salt (Duflo et al. forthcoming)
• Where: Bihar, India
• Intervention: selling low-cost iron-fortified salt to at-risk-for-anemia
families.
• Indicators:
Physical fitness (directly observable; step test)
Physical development (directly observable; anthro measures)
Cognitive development (indirectly observable; puzzles, tests)
Health history (indirectly observable; surveying on
immunizations received, etc.)
58
60. When to Collect Data
[ Baseline ] : Before you start
During the intervention
End line : After you’re done
[ Scale-up, intervention ]
61. Methods of Data Collection: Not Just Surveys
Surveys- household/individual
Administrative data
Logs/diaries
Qualitative – eg. focus groups,
RRA
Games and choice problems
Observation
Health/Education tests and
measures
62. Where can we get Data?
The good . . . . and the bad
Administrative data
• Collected by a
government or similar
body as part of
operations
• May already be collected
and thus free
• Can be extremely
accurate (e.g. electricity
bills)
• May not exist or not
answer the question you
want
• May itself change
behavior (e.g. taxes)
Other secondary data
• Collected for
research or other
purposes not admin
• May already be collected
and thus free
• Can inform the larger
context of a project
• May not exist or not
answer the question you
want
• Dubious quality
Primary data
• Collected by
researchers for study
• Address the exact
question of interest
• Cover channels and
assumptions
• Very costly and time
consuming
• May be biased answers
63. Primary Data Collection
Surveys
• Questions
• Exams, tests, games, vignettes, etc.
Direct Observation
• Personal or technological (e.g. smoke sensor, vibration sensor)
Diaries/Logs
64. Common Survey Modules Can be Adapted for a Particular
Project
Demographics
Economic
• Income, consumption, expenditure, time use
• Yields, production, etc.
Beliefs
• Expectations or assumptions
• Bargaining power, patience, risk
Anthropometric
Cognitive, Learning
65. Primary Data Collection Considerations
Quality
• Reliability and validity of the data
Costs / Logistics
• Surveyor recruitment and training
• Field work and transport, interview time
• Electronic vs. paper
• Data entry, reconciliation, cleaning, etc.
Ethics
Human subjects, data security
66. Reliability of Data Collection
The process of collecting “good” data requires a lot of efforts and
thought
Need to make sure that the data collected is precise and accurate.
avoid false conclusions, for any research design
The survey process:
Design of questionnaire Survey printed on paper/electronic
filled in by enumerator interviewing the respondent data entry
electronic dataset
Where can this go wrong?
67. Things to Take-away :
Theory of change guides measurement
• Want to measure each step
Data collection all about trade-offs
• Quality and cost
• Bias (accuracy) and noise (precision)
Creative techniques can sometimes help
• Think about what outcomes are most important
Evaluation could mean many things
I’ll first try to narrow down the scope of evaluation that we’re discussing
And go through some terms and definitions
to make sure we’re all on the same page.
I won’t cover just the WHAT
But also the WHO and
the WHY
Sub-groups (how to randomize)
Covariates (sample size)
Predictors of compliance (analysis)
He who has a why can find any how
Social desirability: Respondent might feel badly to tell the truth about some socially undesirable subjects
How many drinks did you have last week?
Do you always use a condom when having sex?
Recal: Some variables are easier to remember than others e.g. birth of a child
Often have to collect data that involves recall where things are not so easy:
Time use, consumption, health, finances, exam scores…….how do you ensure reliability?
Easiest: shorten the recall period
One strategy is to do high frequency short surveys where you ask respondents to recall events from prior day.
Diaries: e.g. time-use or financial diaries
Can be difficult to implement- respondents don’t always fill them out (or fill them out when you ask for them)
Neutrality The act of measurement can actually influence behavior of respondents
Live in enumerator measuring each person’s food
Financial diaries
If treatment and control group respond differently to measurement then it would be problematic
Respondent might not be willing/able to give the “true” answer
Why?
Affects the accuracy of the data
Survey design should try to minimize this bias
I have broken down hard to measure indicators into five categories. Overlap to some extent.
Two strategies for attacking these types of indicators is choosing the right outcome to measure and designing your questions well
I have broken down hard to measure indicators into five categories. Overlap to some extent.
Two strategies for attacking these types of indicators is choosing the right outcome to measure and designing your questions well
First question of your survey shouldn’t be about sex – get people comfortable first
There are two reasons why we might not want to start with the hard stuff:
Build a rapport
If they choose to end the survey, you don’t want to lose any of your other questions
Asking in the third person: the typical, “so I have a friend…”
But in all seriousness…
I have broken down hard to measure indicators into five categories. Overlap to some extent.
Two strategies for attacking these types of indicators is choosing the right outcome to measure and designing your questions well
Are you a hyperbolic discounter?
I have broken down hard to measure indicators into five categories. Overlap to some extent.
Two strategies for attacking these types of indicators is choosing the right outcome to measure and designing your questions well
First question of your survey shouldn’t be about sex – get people comfortable first
Admin data may not cover your population of interest and all the variables you’re interested in . Surveys- you can ask exactly what you need, but then you get reported answers(easy to distort preferences). Moderately expensive . Issue of surveyor and respondent fatigue.
Qualitative – could be very good for defining hypothesis, working in a new context, a nice pre pilot focus group.
Games and choice problems to understand behavior and preferences. Need to be careful about what you’re trying to measure with the game, how to make sure people understand the game, and how to make sure that surveyors do not distort any randomization involved .
Observation- is a public works project in place, go to a school and take an attendance list, go to a village meeting and count participation of women.
Other objective health and education measures – Anthropometrics - objective but might be expensive to administer.
You could combine combinations of all this, for example at the end of the household survey, the interview can note some observations, can also play a game with the respondents in the middle of survey
The bread and butter, meet and potatos
Games: trust, marshmallow