This document provides an overview and outline of a 4-day impact evaluation training curriculum. Day 1 focuses on introductions and why impact evaluation is valuable by answering questions about why evaluate, monitoring vs evaluation, and impact evaluation. It discusses how to implement an impact evaluation by estimating a counterfactual and addressing selection bias. Day 2 will cover evaluation design including causal inference, choosing methods, and the impact evaluation toolbox. The following days cover sample design, data collection, and indicator and questionnaire design. The document emphasizes that choosing an evaluation design depends on how the program is implemented and the rules of operation.
Impact Evaluation Training Curriculum - Activity 267
1. Chris Nicoletti
Activity #267: Analysing the socio-economic
impact of the Water Hibah on beneficiary
households and communities (Stage 1)
Impact Evaluation
Training Curriculum
Session 1
April 16, 2013
2. This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge
its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary
Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not
necessarily those of the World Bank.
MEASURING IMPACT
Impact Evaluation Methods for Policy
Makers
3. 3
• My name is Chris Nicoletti
• From NORC
• Senior Impact Evaluation Analyst.
• Worked in Zambia, Ghana, Cape Verde, Philippines,
Indonesia, Colombia, Burkina Faso, etc.
• Live in Colorado
– I like to ski, hike, climb, bike, etc.
– Married and do not have any children
• What is your name?
• Let’s go around the room and do introductions…
Introduction…
Impact Evaluation Training Curriculum - Activity 267
4. 4
Tuesday - Session 1
INTRODUCTION AND OVERVIEW
1) Introduction
2) Why is evaluation valuable?
3) What makes a good evaluation?
4) How to implement an evaluation?
Wednesday - Session 2
EVALUATION DESIGN
5) Causal Inference
6) Choosing your IE method/design
7) Impact Evaluation Toolbox
Thursday - Session 3
SAMPLE DESIGN AND DATA COLLECTION
9) Sample Designs
10) Types of Error and Biases
11) Data Collection Plans
12) Data Collection Management
Friday - Session 4
INDICATORS & QUESTIONNAIRE DESIGN
1) Results chain/logic models
2) SMART indicators
3) Questionnaire Design
Outline: topics being covered
5. 5
Today, we will answer these
questions…
Impact Evaluation Training Curriculum - Activity 267
Why is evaluation valuable?
What makes a good impact
evaluation?
How to implement an impact
evaluation?
1
2
3
Why is evaluation valuable?
What makes a good impact
evaluation?
1
2
3
6. 66
Today, we will answer these
questions…
Impact Evaluation Training Curriculum - Activity 267
Why is evaluation valuable?
What makes a good impact
evaluation?
1
2
3 How to implement an impact
evaluation?
7. 77
Why Evaluate?
Impact Evaluation Training Curriculum - Activity 267
Need evidence on what works
Information key to sustainability
Improve program/policy implementation
1
2
3
Limited budget and bad policies could hurt
Design (eligibility, benefits)
Operations (efficiency & targeting)
Budget negotiations
Informing beliefs and the press
Results agenda and Aid effectiveness
8. 8
Results-Based Management
is a global trend
Establishing links between monitoring and
evaluation, policy formulation, and budgets
Managers are judged by their programs’
performance, not their control of inputs:
A shift in focus from inputs to outcomes.
Critical to effective public sector management
What is new about results?
Impact Evaluation Training Curriculum - Activity 267
9. 9
Monitoring vs. Evaluation
Monitoring Evaluation
Frequency Regular, Continuous Periodic
Coverage All programs Selected program, aspects
Data Universal Sample based
Depth of
Information
Tracks implementation,
looks at WHAT
Tailored, often to performance
and impact/ WHY
Cost Cost spread out Can be high
Utility
Continuous program
improvement, management
Major program decisions
Impact Evaluation Training Curriculum - Activity 267
10. 10
Monitoring
A continuous process of collecting and analyzing
information,
to compare how well a project, program or policy is
performing against expected results, and
to inform implementation and program management.
Impact Evaluation Training Curriculum - Activity 267
11. 1111
Impact Evaluation Answers
Impact Evaluation Training Curriculum - Activity 267
What was the effect of the program on
outcomes?
How much better off are the beneficiaries
because of the program/policy?
How would outcomes change if the
program design changes?
Is the program cost-effective?
12. 12
Evaluation
A systematic, objective assessment of an on-going
or completed project, program, or policy, its design,
implementation and/or results,
to determine the relevance and fulfillment of objectives,
development efficiency, effectiveness, impact and
sustainability, and
to generate lessons learned to inform the decision making
process,
tailored to key questions.
Impact Evaluation Training Curriculum - Activity 267
13. 13
Impact Evaluation
An assessment of the causal effect of a project ,
program or policy on beneficiaries. Uses a
counterfactual…
to estimate what the state of the beneficiaries would have
been in the absence of the program (the control or
comparison group), compared to the observed state of
beneficiaries (the treatment group), and
to determine intermediate or final outcomes attributable
to the intervention.
Impact Evaluation Training Curriculum - Activity 267
14. 1414
Impact Evaluation Answers
Impact Evaluation Training Curriculum - Activity 267
What is effect of a household (hh) water
connection on hh water expenditure?
Does contracting out primary health care
lead to an increase in access?
Does replacing dirt floors with cement
reduce parasites & improve child health?
Do improved roads increase access to
labor markets & raise income?
15. 1515
Answer these questions
Impact Evaluation Training Curriculum - Activity 267
Why is evaluation valuable?
How to implement an impact
evaluation?
What makes a good impact
evaluation?
1
2
3
16. 1616
How to asses impact
Impact Evaluation Training Curriculum - Activity 267
What is beneficiary’s test score with program
compared to without program?
Compare same individual with & without
programs at same point in time
Formally, program impact is:
α = (Y | P=1) - (Y | P=0)
e.g. How much does an education program
improve test scores (learning)?
17. 1717
Solving the evaluation problem
Impact Evaluation Training Curriculum - Activity 267
Estimated impact is difference between treated
observation and counterfactual.
Counterfactual: what would have happened
without the program.
Need to estimate counterfactual.
Never observe same individual with and without
program at same point in time.
Counterfactual is key to impact evaluation.
18. 1818
Counterfactual Criteria
Impact Evaluation Training Curriculum - Activity 267
Treated & Counterfactual
(1) Have identical characteristics,
(2) Except for benefiting from the intervention.
No other reason for differences in
outcomes of treated and counterfactual.
Only reason for the difference in
outcomes is due to the intervention.
19. 1919
2 Counterfeit Counterfactuals
Impact Evaluation Training Curriculum - Activity 267
Before and After
Those not enrolled
Those who choose not to
enroll in the program
Those who were not offered
the program
Same individual before the treatment
20. 2020
1. Before and After: Examples
Impact Evaluation Training Curriculum - Activity 267
You do not take into consideration things
that are changing over the intervention
period.
Agricultural assistance program
Financial assistance to purchase inputs.
Compare rice yields before and after.
Before is normal rainfall, but after is drought.
Find fall in rice yield.
Did the program fail?
Could not separate (identify) effect of financial
assistance program from effect of rainfall.
21. 2121
2.Those not enrolled: Example 1
Impact Evaluation Training Curriculum - Activity 267
Compare employment & earning of those
who sign up to those who did not
Job training program offered
Who signs up?
Those who are most likely to benefit -i.e. those with more
ability- would have higher earnings than non-participants
without job training
Poor estimate of counterfactual
22. 22
What’s wrong?
Impact Evaluation Training Curriculum - Activity 267
Selection bias: People choose to participate
for specific reasons
1
2
3
Job Training: ability and earning
Health Insurance: health status and medical
expenditures
Many times reasons are related to the
outcome of interest
Cannot separately identify impact of the
program from these other factors/reasons
23. 23
Possible Solutions???
Impact Evaluation Training Curriculum - Activity 267
Need to guarantee comparability of treatment
and control groups.
ONLY remaining difference is intervention.
In this training we will consider:
Experimental Designs
Quasi-experiments (Regression Discontinuity, Double
differences)
Non-experimental (or) Instrumental Variables.
EXPERIMENTAL DESIGN!!!
24. 2424
Answer these questions
Impact Evaluation Training Curriculum - Activity 267
Why is evaluation valuable?
How to implement an impact
evaluation?
What makes a good impact
evaluation?
1
2
3
25. 25
When to use Impact
Evaluation?
Evaluate impact when project is:
Innovative
Replicable/scalable
Strategically relevant for reducing
poverty
Evaluation will fill knowledge gap
Substantial policy impact
Use evaluation within a program to test
alternatives and improve programs
Impact Evaluation Training Curriculum - Activity 267
26. 26
Choosing what to evaluate
Criteria
Large budget share
Affects many people
Little existing evidence of impact for
target population (IndII Examples?)
No need to evaluate everything
Spend evaluation resources wisely
27. 27
IE for ongoing program
Development
Are there potential program
adjustments that would benefit from a
causal impact evaluation?
Implementing parties have specific
questions they are concerned with.
Are there parts of a program that may
not be working?
28. 28
How to make evaluation
impact policy focused
Example: Scale up pilot? (i.e., Water Hibah)
Criteria: Need at least a X% average increase in beneficiary
outcome over a given period
Address policy-relevant questions
What policy questions need to be answered?
What outcomes answer those questions?
What indicators measures outcomes?
How much of a change in the outcomes
would determine success?
29. 29
Policy impact of evaluation
What is the policy purpose?
Provide evidence for pressing decisions
Design evaluation with policy makers
IndII Examples???
30. 30
Decide what need to
learn.
Experiment with
alternatives.
Measure and inform.
Adopt better alternatives
overtime.
Policy impact of evaluation
Change in incentives
Rewards for changing programs.
Rewards for generating knowledge.
Separating job performance from knowledge generation.
Cultural shift
From retrospective evaluation
to prospective evaluation.
Look back and judge
31. 31
• Choosing what to evaluate is something that
should take time and careful consideration.
• Impact evaluation is more expensive and often
requires third party consultation.
• The questions that require an IE to answer should
be evident in your logic models and M&E plans
from the beginning.
• Remember, IE is an assessment of the causal effect of
a project, program or policy on beneficiaries.
Choice should come from existing
logic models and M&E plans.
33. 33
Retrospective Analysis
Retrospective Analysis is necessary when we
have to work with a pre-assigned program
(expanding an existing program) and existing data
(baseline?)
Examples:
Regression Discontinuity: Education Project (Ghana)
Difference in Differences: RPI (Zambia)
Instrumental variables: Piso firme (México)
34. 34
• Use whatever is available – the data was not collected for
the purposes at hand.
• The researcher gets to choose what variables to test, based on
previous knowledge and theory.
• Subject to misspecification bias.
• Theory is used instrumentally, as a way to provide a
structure justifying the identifying assumptions.
• Less money on data collection (sometimes), more money
on analysis.
• Does not really require “buy in” from implementers or field
staff.
Retrospective Designs
35. 35
Prospective Analysis
In Prospective Analysis, the evaluation is
designed in parallel with the assignment of the
program, and the baseline data can be gathered.
Example: Progresa/Oportunidades (México)
CDSG (Colombia)
36. 36
• Intentionally collect data for the purposes of the impact
evaluation.
• The variables collected in a prospective evaluation are
collected because they were considered potential
outcome variables.
• You should report on all of your outcome variables.
• The evaluation itself may be a form of treatment.
• It is the experimental design that is instrumental - gives
more power both to test the theory and to challenge it.
• More money on data collection, less money on analysis.
• Requires “buy in” from implementers and field staff.
Prospective Designs
37. 37
Prospective Designs
Use opportunities to generate good controls
The majority of programs cannot assign
benefits to all the entire eligible population
Not all eligible receive the program
Budget limitations:
Eligible beneficiaries that receive benefits are potential treatments
Eligible beneficiaries that do not receive benefits are potential
controls
Logistical limitations:
Those that go first are potential treatments
Those that go later are potential controls
38. 38
• The decision to conduct an impact evaluation was
made after the program began, and ex post
control households were identified.
• We are now trying to use health data from
Puskesmas to “fill in the gaps” of the baseline.
• This would be a retrospective design, because
there was not an experimental design in place for
the roll out of the program.
An example: Socio-econ
impact of Endline Water Hibah
41. 41
How to choose?
Identification strategy depends on
the implementation of the program
Evaluation strategy depends on the
rules of operations
42. 42
Who gets the program?
Eligibility criteria
Are benefits targeted?
How are they targeted?
Can we rank eligible's priority?
Are measures good enough for fine rankings?
Roll out
Equal chance to go first, second, third?
43. 43
Rollout base on budget/administrative constraints
Ethical Considerations
Equally deserving beneficiaries deserve an equal
chance of going first
Give everyone eligible an equal chance
If rank based on some criteria, then criteria
should be quantitative and public
Equity
Transparent & accountable method
Do not delay benefits
44. 44
The Method depends on
the rules of operation
Targeted Universal
In Stages
Without
cut-off
o Randomization
o Randomized
Rollout
With
cut-off
o RD/DiD
o Match/DiD
o RD/DiD
o Match/DiD
Immediately
Without
cut-off
o Randomized
Promotion
o Randomized
Promotion
With
cut-off
o RD/DiD
o Match/DiD
o Randomized
Promotion
45. 45
• Provision of services to villages and households under the Water
Hibah is not determined by randomization, but by assessment and
WTP.
• The dataset design exhibits some characteristics of a controlled
experiment with connected and unconnected, but connection decision
is not determined by randomization.
• Household matching is not an efficient method with the potential
discrepancies we identified in the pilot test, and does not work very
well with the sample design that was chosen.
• Village-level matching is not feasible because there are usually
connected and unconnected in a single village (locality).
• The design we have chosen is: pretest-posttest-nonequivalent-
control-group quasi-experimental design that will use
regression-adjusted Difference-in-Difference impact
estimators.
An example: Socio-econ
impact of Endline Water Hibah
47. 47
Types of Designs
Random Sampling
Multi-Stage Sampling
Systematic Sampling
Stratified Sampling
Convenience Sampling
Snowball Sampling
Types of Sample Designs
Plus any combination of them!
48. 48
• It is important to note that sample design can be
extremely complex.
• A good summary is provided by Duflo (2006):
• The power of the design is the probability that, for a given effect size and a given
statistical significance level, we will be able to reject the hypothesis of zero effect.
Sample sizes, as well as other (evaluation & sample) design choices, will affect
the power of an experiment.
• There are lots of things to consider, such as:
• The impact estimator to be used; The test parameters (power level, significance
level); The minimum detectable effect; Characteristics of the sampled (target)
population (population sizes for potential levels of sampling, means, standard
deviations, intra-unit correlation coefficients (if multistage sampling is used)); and
the sample design to be used for the sample survey
A good sample design requires
expert knowledge
49. 49
The basic process is this…
Level of Power
Level of
Hypothesis
Tests
Correlations in
outcomes
within groups
(ICCS)
Mean and
Variance of
outcomes &
MDES
50. 50
• Most times, you do not have all of this
information.
• Use existing studies; other data sources; assumptions.
• Working backwards to fit a certain power size.
• Working backwards b/c expected level of impact
that you want to test for.
• You are working backwards to fit a certain
budget!
• Build in marginal costs for each stage of sampling.
• Decide whether or not to pursue project.
The reality is…
51. 51
• Outcome indicators: we have simplified versions of them
in the baseline, but they have been modified for endline
Use baseline dataset to calculate ICCs.
• Highest variation in outcome indicators was identified across
villages (localities) primary sample unit is the village.
• The # of households in the village was found to improve the
efficiency of the design stratify villages based on the # of
households
• Marginal costs of village visit vs. household visit were
included.
• The final sample design that was identified is referred to
as: Stratified Multi-stage sampling with 250 villages and
7-14 households per experimental group = 7,000 hhs.
An example: Socio-econ
impact of Endline Water Hibah
52. What can IndII Do?
Ensure your M&E systems are relevant
and reliable…
53. 53
Data: Coordinate IE &
Monitoring Systems
Typical content
Lists of beneficiaries
Distribution of benefits
Expenditures
Outcomes
Ongoing process evaluation
Projects/programs regularly collect data for
management purposes
Information is needed for impact evaluation
54. 54
Manage M&E for results
Tailor policy questions
Precise unbiased estimates
Use your resources wisely
Better methods
Cheaper data
Timely feedback and program changes
Improve results on the ground
Prospective evaluations are easier and
better with reliable M&E
55. 55
Evaluation uses information to:
Verify who is beneficiary
When started
What benefits were actually
delivered
Necessary condition for program to
have an impact: Benefits need to
get to targeted beneficiaries.
56. 56
Overall Messages
Evaluation design
Impact evaluation
Is useful for:
Validating program design
Adjusting program structure
Communicating to finance ministry & civil society
A good one requires estimating the counterfactual:
What would have happened to beneficiaries if had not
received the program
Need to know all reasons why beneficiaries got program &
others did not
57. 57
Other messages
Good M&E is crucial not only to effective project management but can be a
driver for reform
Monitoring and evaluation are separate, complementary functions, but both are
key to results-based management
Have a good M&E plan before you roll out your project and use it to inform the
journey!
Design the timing and content of M&E results to further evidence-based
dialogue
Good monitoring systems & administrative data can improve IE.
Easiest to use prospective designs.
Stakeholder buy-in is very important
58. 58
Tuesday - Session 1
INTRODUCTION AND OVERVIEW
1) Introduction
2) Why is evaluation valuable?
3) What makes a good evaluation?
4) How to implement an evaluation?
Wednesday - Session 2
EVALUATION DESIGN
5) Causal Inference
6) Choosing your IE method/design
7) Impact Evaluation Toolbox
Thursday - Session 3
SAMPLE DESIGN AND DATA COLLECTION
9) Sample Designs
10) Types of Error and Biases
11) Data Collection Plans
12) Data Collection Management
Friday - Session 4
INDICATORS & QUESTIONNAIRE DESIGN
1) Results chain/logic models
2) SMART indicators
3) Questionnaire Design
Outline: topics being covered
60. This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge
its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary
Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not
necessarily those of the World Bank.
MEASURING IMPACT
Impact Evaluation Methods for Policy
Makers