Whast goes up must come down: challenges of getting evidence back to the ground
Evaluating impact of humanitarian action: a science or an art (Jo Puri, 3iE)
1. Evaluating impact of Humanitarian
Action:
A science and an art?
Jo (Jyotsna) Puri
Head of Evaluation
Deputy Executive Director, 3ie
www.3ieimpact.org
3. What is impact evaluation?
Impact evaluations answer the question about
the extent the intervention being evaluated
altered the state of the world
= the (outcome)this
We can see
indicator with the intervention
compared to what it would have been in the
absence of the interventionBut we can’t see this
So we use a
= Yt(1) – Yt(0) comparison group
5. Starting with a theory of Change
Behavioral
attributes
ensure correct
spending
Cash People
Households Increased Improved
transfers Money is record high
are targeted purchasing livelihood
designed sent levels of
power indicators satisfaction
Access to
markets
Households No
correctly id-ed leakage
6. Group exercise question II
• What is the theory of change/causal chain for
this project that you are interested in?
• Write one outcome that is important?
• What were the assumptions and risks in
various stages?
6
7. The counterfactual
Outcome monitoring
Before After
Intervention 40 92
Control 84
Before vs. after (single difference) = 92-40 = 52
(outcome monitoring)
Post-treatment comparison = 92-84 = 8
www.3ieimpact.org
8. The counterfactual
Outcome monitoring
Before After
Intervention 40 92
Control 26 84
Before vs. after (single difference) = 92-40 = 52
(outcome monitoring)
Post-treatment comparison = 92-84 = 8
Double difference = (92-40)-(84-26) = -6
www.3ieimpact.org
9. Group Exercise III
• Write a matrix for the outcome you are
interested in examining.
• Write (hypothetical) numbers in the matrix.
• Calculate the following:
– Single difference
– Single ex post difference
– Double difference
10. Overall Aim: Improve lives
• Evidence on what works
and why, how;
• Improve awareness and
accountability on impact
and process;
• Effective allocation of
funds;
• Increase likelihood that
humanitarian
interventions are effective
and efficient;
12. The essence of large n design
Before After
Project
Comparison
13. Large n
• n is the number of units of assignment, e.g.
schools, villages, sub-districts (the unit of
assignment can be different from the unit of
analysis)
• If n is large then we create treatment (project)
and comparison groups which are identical
prior to the intervention…
– And use statistical analysis to assess post-
intervention differences between treatment and
comparison: we say these differences are caused
by the intervention
15. Step I Step II Step III
Eligible units Evaluation sample Random Assignment
Control
Treatment
External Validity Internal Validity
Ineligible Eligible
17. Impact Evaluationspolicies
Impact evaluation and for
development assistance
Efficacy: Does it work
in laboratory
Would it have
conditions?
happened anyway?
Are there other
Did the program ways, that are
If the program caused
cause the change? cheaper to get the
the effect, how much
same impact?
was the effect?
Theory of change
Counterfactual
Mixed methods
Outcome variables
Internal validity: power, sample size, spill overs, john henry effects
External validity: Heterogeneity, representativeness, context
18. Working definition: Humanitarian
action
Response to an emergency,
to protect human life,
health and subsistence.
The emergency can be the
consequence of a natural
disaster or a conflict.
-Slow onset disasters
- Short term or longer term
19. Impact Evaluationspolicies
Impact evaluation and for
humanitarian assistance
Who lost most? Who
recovered best?
Unintended
Was it timely? How/ consequences. Are there other
Did the program was adequate
increase the ways, that are
coverage ensured? cheaper to get the
resilience of
populations? same impact? (cost-
effectiveness)
Did the affected
population recover?
20. Real-time vs Impact
Evaluations Evaluations
• Evaluates processes • Includes real time evaluations
• Focuses on developmentMeasures net change in welfare
•
and implementation of levels; Measurement biases.
the program • Expensive but low cost
• Examines targets were evaluations too.
met • Robust evidence (relief,
• Is cheap (?) recovery, resilience) unintended
• Controversial • Vulnerable populations
• Long term policy
21. Humanitarian Crisis:
Heterogeneity of impacts
NATURAL DISASTER ARMED CONFLICT
POVERTY
SOCIAL INEQUALITY
POOR GOVERENANCE
STATE FRAGILITY
FOOD INSECURITY
WELL BEING OUTCOMES
22. CLEAR AND PRESENT NEED FOR IMPACT
EVALUATIONS IN HUMANITARIAN
ASSISTANCE
23. Some facts
• In 2011, 62 million people
were affected by crises
across the world
• Natural disasters, alone,
killed almost 26,000
people.
24. Humanitarian Window – a need.
• “Understanding the impact
of humanitarian assistance
is another area where
much work is
needed….Linking impact
measurement and
accountability better to the
funds agencies receive is a
key recommendation of
this review.”
25. Need for impact evaluations
• There is a big gap between the
requirement and availability of
funds.
– In 2011, shortage of funds
amounted to $3.4 billion
Critical that we know the efficiency and
effectiveness of interventions
27. Humanitarian vs. Development IEs
Humanitarian interventions are more complex to evaluate
than development interventions
Development Evaluations Humanitarian Evaluations
Selection bias
All development evaluations; plus
Fragile states and vulnerable populations
Rapid onset
Multiple concurrent interventions
High covariance
Inadequate data
Disrupted communities
Difference in resources and need
Absence of baseline data
Difficulty in counterfactual selection
29. Single difference
Disaster related losses Recovery from disaster Persistence of recovery
Baseline Emergency Relief Recovery
t-1 t0 t1 t2
Restoration to baseline Sustained Recovery
Sustained restoration to baseline
31. Proportionate changes
Why is it important? Heterogeneity!!
Case 1 Case 2
Baseline= 3 buffalos Baseline = 6 buffalos
100% loss of large livestock 50% loss of large livestock
32. Pakistan Earthquake 2005: Background
• Struck on 8th October,
2005
• 7.6 on the Richter scale
• Immediate toll –
• 73,000 deaths
• 128,000 injured
• 600,000 houses
destroyed
• Estimated damages were
US $5.8 billion
33. Pakistan Earthquake 2005:
ERRA evaluation
• The ERRA was set up:
primary responsibility for
allocating reconstruction
funds;
• ERRA undertook a “social
impact assessment”
• Conducted a pre-/post
assessment (no
counterfactual)
• PROBLEMS?
• Selection Bias
• Information Bias
• Contamination Bias
34. Pakistan Earthquake 2005:
World Bank Evaluation
• They compare recovery in
villages that were more vs.
less affected (use as
counterfactual)
• The evaluation focused on
• Recovery for
households and
educational facilities
• Access/quality of
schooling
• Effects of grants
• Has limitations, should
compliment ERRA study
35. Suggested steps – Looking back
Immediately after rescue efforts -
1. Identify the long term
household-level
outcomes of interest
• Clarify what questions
an IE is designed to
answer
• Create a focused list
of outcomes to guide
the evaluation. For
example (next slide)
36. Impact Evaluation: Outcome indicators for hypothetical
evaluation design
Education:
· Net and gross enrollment rates at primary, middle and matric levels
Health:
· Infant/child immunization coverage
· Diarrheal prevalence, last 30 days, children under 5
· Provider consultation and treatment rates for recent illness/injury
· % of women with recent birth receiving tetanus toxoid injection
· Skilled attendance and location of childbirth
Housing, water supply and sanitation:
· Roof and wall materials
· Number of rooms
· Source of drinking water
· Type of toilet
Household perception of economic situation and satisfaction with
facilities and service use:
· Perception of economic situation of household compared to one year ago
· Perception of economic situation of community compared to one year ago
· Satisfaction with local services basic health unit, family planning services, school,
veterinary hospital, agricultural extension, and police
37. Suggested steps – Looking back
Immediately after rescue efforts -
2. Obtain a pre-earthquake area-representative
household sample
• In Pak, a good measure would have been the 2004-
05 PSLM
• Importance of baseline data
38. Suggested steps – Looking back
Immediately after rescue efforts -
3. Collect data on the pre-
earthquake sample
immediately post-
earthquake
• Use the PSLM and re-
interview households
• Expand sample if
necessary
• Post-disaster surveying
includes unaffected
households as well.
39. Suggested steps – Looking back
Immediately after rescue efforts -
4. Design interventions for
staged roll out or other
variations
• Provides a
counterfactual
• Allows comparing two
or more interventions,
and understanding best
practices
• Ethical
40. Some variations
• Theory of change (without contamination bias)
• Basic care package
– Factorial design = A vs. A+B vs. A+C
• Cluster randomized designs can help determine
the effectiveness of packages
• Examples
– Rwanda – messaging
– Sierra Leone – Paired matching
– Aceh – Documentation documentation!
– Burundi – Phased roll out amongst ex-combatants.
41. Conclusions
IEs
• Provide insight regarding the losses resulting
from an emergency and compare them with
a baseline;
• Test innovative programmes in real-life
situations;
• See what difference assistance has made;
• Examine whether recovery is sustained;
• Examine the cost-effectiveness of
interventions.
Better evidence on what works and why, under what circumstances and at what cost.Improve awareness about the kind of policies that are effective and accountabilityIncrease likelihood that H interventions are able to contribute to reduced vulnerability and increased reslience. Effective allocation of funds
The working paper had a definition for natural disasters, not humanitarian crises. Do you like this def?Also keeps with the GHA definition which is: aid and action designed to save lives, alleviate suffering and protect human dignity in the aftermath of emergencies. It is usually short term but in practice it is hard to say where ‘during and in the immediate aftermath of the emergency’ ends and when other assistance begins, especially in situations of prolonged vulnerability. Definition from the School for Center for Peace in Spain: A humanitarian crisis is a situation in which there is an exceptional and generalized threat to human life, health or subsistence. These crises usually appear within the context of an existing situation of a lack of protection where a series of pre-existent factors (poverty, inequality, lack of access to basic services) exacerbated by a natural disaster or armed conflict, multiply the destructive effectsSource - http://escolapau.uab.cat/img/programas/alerta/alerta/10/cap04i.pdf
By July in 2012, 61 million people had already been affectedGlobal Humanitarian Assistance. GHA Report 2012. Rep., 2012. <http://www.globalhumanitarianassistance.org/reports>IbidThe Year That Shook the Rich: A review of natural disasters in 2011. The Brookings Institution – London School of Economics Project on Internal Displacement, March 2012
1. OCHA report 2011Cumulative economic cost was US$380 billion, making 2011 the most expensive year in history for natural disastersThe Humanitarian Emergency Response Review (2011) noted the lack of impact evaluation of Humanitarian initiatives, and recommended rigorous IEs
Cite Alison Buttenheim’s working paper
This might not be relevant for this presentation
This might not be relevant for this presentationProportionate disaster lossesProportionate recovery of lossesPrpoportionate restoration to baseline
For example, two households both lose 3 water buffalo in a disaster.
- Relief efforts started immediately with the establishment of the Federal Relief Commission completed by the Asian Development Bank and the World Bank at the request of the Government of Pakistan.Data were collected and compiled from several sources, including sector-specific field assessments, desk reviews, aerial reconnaissance, site visits, and interviews.The needs assessment generated both a very detailed pre-earthquake profile of the affected areas and damage estimates: (Conversion from PKR to USD on 31st Oct, 2005; 1USD= 59.7 PKR)Rs.135,146 millions ($2263.69 mil) in direct damage: 2 Bn in direct damageRs.34,187 millions ($572.63 mil) in indirect losses, and : 0.5 Bn in indirect lossesRs.208,091 million ($3485.5244 mil) in reconstruction costs: 3.5 Bn in reconstruction(Conversion from PKR to USD on todays rate; 1USD= 96.65 PKR)Rs.135,146 millions ($1398.3mil) in direct damage, Rs.34,187 millions ($353.7mil) in indirect losses, and Rs.208,091 million ($2153.03 mil) in reconstruction costs
Earthquake Reconstruction and Rehabilitation AuthorityThe official mission of ERRA was to coordinate reconstruction and recovery efforts in the affected areas and across the multitude of local, national and international government agencies and nongovernment organisations (NGOs) that were operating in the areas.‘Build Back BetterEvaluation DesignThe unit of analysis is the household, and an initial sample size of 1350 was chosen in order to detect changes at the 90 per cent confidence level. A two-stage cluster sample was drawn by first sampling 30 rural villages with probability proportionate to size from each of nine heavily affected districts in NWFP and AJK. Within each sampled village, five households were chosen at random from one randomly selected sub-division or neighbourhood.The M&E wing of ERRA, with UK Department for International Development support, devised a survey instrument covering all the ERRA sector priorities. ‘Baseline’ or Round 1 data were collected between April and September 2008, almost three years after the earthquake.Second round was collected in Aug-Sept 2009, with the same expanded to 16 households per village and to urban blocks. But used a different survey – the Pakistan Social and Living Standards Measurement Survey (PSLM) that is conducted annually since 2005Problems1. Selection bias - This sampling strategy cannot account for households that were in the affected area prior to and during the earthquake, but then were not available for sampling during the post-intervention periods. These households may not have been available either because the entire household died or because they had left the region. In either case, this missing group is likely to be different from households that weresampled. In addition, the design is not explicit about how households were selected for interventions. For example, housing programmes that target the worst-hit households might have less impressive results compared to programmes that implicitly targeted less-vulnerable elites.2. Information bias: Respondents may not accurately remember details about their housing, livelihoods, or schooling prior to earthquake. If interviewed by officials associated with the recovery effort, respondents may report more or less favourable conditions for either time period depending on perceived interviewer expectations or future benefits. The extent of misreporting may also be correlated with the severity of earthquake exposure or damages incurred.3. Contamination bias. The ERRA Social Impact Assessment study assumes that changes experienced by earthquake-affected households from baseline to post-intervention follow-up are attributable to the ERRA interventions. Without a comparison group, the assumption is a very strong one. - What was the role of remittances and the role of other unaffected households for example?
In 2009 study conducted by WBThe evaluation sample consists of 126 villages randomly drawn from the 1998 population census list of villages in four earthquake-affected districts. Outcomes of interest at the household level include employment, consumption, nutrition, education status of children, mental health, and asset recovery. The study also hopes to link household data to administrative and bank records of cash transfers. At the school level, post-earthquake staffing, infrastructure, enrolment and test-scores will be evaluated in both private and public schools. The World Bank undertook a detailed census of 28,000 households in sampled villages in spring 2009, with a more extensive questionnaire administered to 25 per cent of households. A second round of data, including a detailed household survey of 2500 randomly selected households, was fielded in fall 2009. School-based survey modules collect data on school facilities, enrolment, and child outcomes including cognitive and achievement testing.Preliminary results on education indicate that school interruption was significant: almost four months for young children and more than five months for older children. The proportion of schools destroyed by the earthquake increased sharply as distance from the fault line decreased, but when distance to the fault line is controlled, private schools appeared to sustain fewer damages than public schools. Consequently, public schools witnessed a decrease in school enrolment from pre-earthquake to post-earthquake periods, while private schools increased their enrolments.ProblemsThe study is not without limitations: it will offer little insight into the design of optimal reconstruction programmes. It is not clear how relevant or generalisable the estimation strategy will be beyond the Pakistan situation, and therefore how applicable the findings will be to other post-disaster settings
Facing these problems, Alison Buttenhiem sets up a hypothetical design for the evaluation of the Pak earthquakesBecause it’s hypothetical, she assumes that the immediate post disaster period (2-3) was devoted solely to Design picks up after rescue efforts are completed; relief provisions are well underway and recovery programs are being planned1. Identify the long-term household-level outcomes of interest. It is important to clarify exactly what questions an impact evaluation is designed toanswer. It was evident immediately after the earthquake that housing, health facilities, school facilities, government buildings, and infrastructure for WATSAN, power, telecom, and transit were all seriously compromised. ERRA’s sectoral approach to the design, delivery and evaluation of recovery programs reflects this, as does the list of household and community outcomes developed by ERRA and reproduced above. While ERRA’s list served process outcomes and monitoring well, a focused list of household- level outcomes and indicators will guide the impact evaluation process.
Source: Based on Pakistan Social and Living Standards Measurement Survey(2004- 05), 2005 and ERRA Social Impact Assessment, 2009.30
2. Obtain a pre-earthquake area-representative household sample. An obvious shortcoming of both the ERRA Social Impact Assessment and the World Bank evaluation is the lack of a pre- disaster, population- representative sample. An evaluation design that compares post-intervention welfare to a pre- disaster point in time requires a pre-disaster observation, preferably collected pre- disaster rather than retroactively.In the case of Pakistan, the 2004-05 Pakistan Social and Living Standards Measurement Survey (PSLM) is a good candidate for such a sample. Interviews took place between September 2004 and March 2005, or 7- 13 months prior to the earthquake. The sample includes 1080 households in Mansehra and Abbotabad districts, some of which were affected by the earthquake and some of which were not. (The samplealso includes 1,322 households in AJ&K, but it is not clear how many, if any, of these households were located in earthquake- affected areas.). In our hypothetical study, this sample becomes the baseline observation for affected and unaffected areas3. Collect data on the pre-earthquake sample immediately post-earthquake. Needsassessments of affected populations are often undertaken in the immediateaftermath of a disaster. In Pakistan, several needs assessments were done, includinga wide- ranging village level needs assessment by the GoP and World Bank. Ofcourse, the focus of these needs assessment was the correct targeting and provisionof relief. For long- term impact evaluation purposes, observing the sampleimmediately after the disaster is also very helpful. In our hypothetical design, thePSLM sample identified in Step 2 is located and briefly re-interviewed in January-March 2006. If necessary, the sample is expanded to provide sufficient power forevaluation analyses. Because the sample includes households in affected andunaffected areas, this post- disaster surveying will also include unaffected areas. Thisround of data collection can serve multiple purposes in addition to serving as asecond “baseline” of sorts for impact evaluation of future recovery efforts: it canprovide accurate estimates of disaster-related mortality and morbidity and postdisasteroutmigration; assess the adequacy of relief efforts; identify priorities forrecovery programs; and reveal intentional and natural variations in recoveryinterventions that can be exploited for evaluation purposes. For example, the WorldBank Study above leverages the household size eligibility requirement for livelihoodgrants, and the variation in the agency providing housing reconstruction grants.4. Design interventions for staged roll out or other variations. As discussed above,experimental designs are a controversial aspect of humanitarian aid provision, andmay not be appropriate in the emergency or relief phase of post- disaster aidprovision. However, recovery programs that unfold over many months or years arebetter suited to an experimental design. They are particularly useful when there is alack of consensus about the best way to deliver an intervention (e.g., how largeshould livelihood cash grants be and when should they be distributed? How muchsweat equity should be required of homeowners during housing reconstruction?Should school reconstruction prioritize the rebuilding of primary or secondaryschools?). Testing competing interventions in an experimental design can provid e32strong evidence about best practices in humanitarian aid that can guide future post -disaster interventions in other settings. In the examples above, beneficiaries are notdeprived of life- saving resource, but instead may receive a different form of a benefitthan beneficiaries in a neighboring village or district. In our hypothetical studydesign, ERRA identifies a set of outstanding debates about intervention theory,design or delivery, and plans sectoral programming to include experimentalconditions. Where practical
3. Collect data on the pre-earthquake sample immediately post-earthquake. Needs assessments of affected populations are often undertaken in the immediate aftermath of a disaster. In Pakistan, several needs assessments were done, including a wide- ranging village level needs assessment by the GoP and World Bank. Of course, the focus of these needs assessment was the correct targeting and provisionof relief. For long- term impact evaluation purposes, observing the sample immediately after the disaster is also very helpful. In our hypothetical design, the PSLM sample identified in Step 2 is located and briefly re-interviewed in January-March 2006. If necessary, the sample is expanded to provide sufficient power for evaluation analyses. Because the sample includes households in affected and unaffected areas, this post- disaster surveying will also include unaffected areas. This round of data collection can serve multiple purposes in addition to serving as a second “baseline” of sorts for impact evaluation of future recovery efforts: it can provide accurate estimates of disaster-related mortality and morbidity and postdisaster outmigration; assess the adequacy of relief efforts; identify priorities for recovery programs; and reveal intentional and natural variations in recoveryinterventions that can be exploited for evaluation purposes. For example, the World Bank Study above leverages the household size eligibility requirement for livelihood grants, and the variation in the agency providing housing reconstruction grants.
4. Design interventions for staged roll out or other variations. Experimental designs are a controversial aspect of humanitarian aid provision, andmay not be appropriate in the emergency or relief phase of post- disaster aid provision. However, recovery programs that unfold over many months or years are better suited to an experimental design. They are particularly useful when there is a lack of consensus about the best way to deliver an intervention. Testing competing interventions in an experimental design can provide strong evidence about best practices in humanitarian aid that can guide future post-disaster interventions in other settings. In the examples above, beneficiaries are not deprived of life- saving resource, but instead may receive a different form of a benefit than beneficiaries in a neighboring village or district. In our hypothetical study design, ERRA identifies a set of outstanding debates about intervention theory, design or delivery, and plans sectoral programming to include experimental conditions.
TEC; Rwanda messaging: Different groups listened to different things. (Paluck 2009), reconciliation messaging in post-conflict 2004. NGO used it. 12 communities matched pairs. New Dawn. But preceded by survey-based measures, focus groups, and observed behavioral measures.Sierra Leone- paired matching from 236 communities (Go Bifo); 236 communities. CDD. The study found that the intervention had mixed impacts. While the intervention succeededin delivering material benefits to the beneficiary communities, impacts on collective action andinclusion of marginalized groups were not significant. That is, from the surveys, focus groups, andactivities, the treated communities did not exhibit significantly better collective action capacity ortendency to include marginalized groups in decision making or sharing of benefits.Aceh – paired matches BUT with clear lack of balance since selected towns had to have administrative capacity. Barron et al. Burundi – ex combatants – phased roll out. WB study. Reintegration program. 23000 ex combatants. Gilligan et al.