
2023-11-15
THis course frofitted a lot from the course materials by Safner and Heiss.
Impact Evaluation is a systematic process that assesses the outcomes and impacts of a program, project, or policy intervention.
Impact evaluation can take various forms, depending on the context and objectives.

Example
If \(X\) is a light switch, and \(Y\) is a light:


Consider all the variables likely to be important to the data-generating process (including variables we can’t observe!)
For simplicity, combine some similar ones together or prune those that aren’t very important
Consider which variables are likely to affect others, and draw arrows connecting them
Test some testable implications of the model (to see if we have a correct one!)

Drawing an arrow requires a direction - making a statement about causality!
Omitting an arrow makes an equally important statement too!
If two variables are correlated, but neither causes the other, likely they are both caused by another (perhaps unobserved) variable - add it!
There should be no cycles or loops (if so, there’s probably another missing variable, such as time)

Example
what is the effect of education on wages?
Education \(X\), “treatment” or “exposure”
Wages \(Y\), “outcome” or “response”


In social science and complex systems, 1000s of variables could plausibly be in the DAG!
So simplify:

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job networks probably cause wages

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job networks probably cause wages
Job networks in fact is probably caused by education!
Location and background probably both caused by unobserved factor (u1)

This is messy, but we have a causal model!
Makes our assumptions explicit, and many of them are testable
DAG suggests certain relationships that will not exist:
laws and netw go through educeduc, then cor(laws,netw) should be zero!
Above exmaples came from: Econometrics by Ryan Safner
Do it yourself: Dagitty
A very nice read by the “father of DAGs”: Pearl and Mackenzie (2019).
See also: The Effect by Huntington-Klein (2021)
| Unit | Control | Treatment | Effect |
|---|---|---|---|
| \(i\) | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) |
| 1 | 8 | 9 | 1 |
| 2 | 5 | 3 | -2 |
| 3 | 6 | 4 | -2 |
| 4 | 6 | 2 | -4 |
| 5 | 15 | 18 | 3 |
| 6 | 13 | 16 | 3 |
| 7 | 8 | 9 | 1 |
| 8 | 2 | 0 | -2 |
| 9 | 4 | 3 | -1 |
| 10 | 2 | 0 | -2 |
| Average | 6.9 | 6.4 | -0.5 |
\(D = T\): Treatment
\(D = C\): Control
\(Y = Y^T\) if \(D = T\)
\(Y = Y^C\) if \(D = C\)
Average causal effect:
\(\bar{\delta} = 6.4 - 6.9 = -0.5\)
| Unit | Control | Treatment | D | Observed |
|---|---|---|---|---|
| \(i\) | \(Y_i^C\) | \(Y_i^T\) | \(Y_{obs}\) | |
| 1 | ? | 9 | T | 9 |
| 2 | 5 | ? | C | 5 |
| 3 | 6 | ? | C | 6 |
| 4 | 6 | ? | C | 6 |
| 5 | ? | 18 | T | 18 |
| 6 | ? | 16 | T | 16 |
| 7 | ? | 9 | T | 9 |
| 8 | 2 | ? | C | 2 |
| 9 | 4 | ? | C | 4 |
| 10 | 2 | ? | C | 2 |
| Average | 4.2 | 13 |
Observed effect:
13 - 4.2 = 8.8 !!
Wrong conclusions:
For subject \(i\), the causal effect of the treatment is the difference between two potential outcomes:
But only one of the two potential outcomes is realized/observed!
| Group | \(Y_i^T\) | \(Y_i^C\) |
|---|---|---|
| Treatment (\(D=T\)) | Observable | Counterfactual |
| Control (\(D=C\)) | Counterfactual | Observable |
Naive comparison of treated vs untreated is often biased

John Snow and Cholera: Youtube
A randomized control trial (RCT) / experimental design randomly assigns individuals to different levels of treatment.
Example: a drug trial with individuals randomly assigned to either receiving the drug (treatment) or a placebo (control or untreated).
Given that “a coin-toss” determines treatment, we completely rule out

Law of large numbers
New experiment of policy instrument

“These results illustrate how policies that manipulate peer groups for a desired social outcome can be confounded by changes in the endogenous patterns of social interactions within the group.”
In impact evaluation (as in social sciences), we are interested in the causal research questions
A potential compromise: Compare alike with alike
The Effect (Huntington-Klein 2021) is a great online resource .
\[\color{#FF851B}{\text{Earnings}_i} = \beta_0 + \beta_1 \color{#0074D9}{\text{Education}_i} + \varepsilon_i\]
If we ran this regression, would \(\beta_1\) give us the causal effect of education?
No!
Exogenous variables
Endogenous variables

Parts of education is endogenous, parts of it is exogenous.
Can we isolate the exogenous part?
“Instead of randomizing the variable ourselves, we hope that something has already randomized it for us. We look in the real world for a source of randomization of our treatment” (Huntington-Klein 2021)
\[ \begin{align} {\color{#c5c5c5}{\text{Endogenous model}}}& &\color{green}{\text{Outcome}_i} &= \beta_0 + \beta_1 \color{red}{\left( \text{Endog. var.} \right)_i} + u_i\\[0.5em] {\text{First stage}}& &\color{red}{\left( \text{Endog. var.} \right)_i} &= \pi_0 + \pi_1 \color{blue}{\text{Instrument}_i} + v_i\\[0.25em] {\text{Second stage}}& &\color{green}{\text{Outcome}_i} &= \delta_0 + \delta_1 \color{red}{\widehat{\left( \text{Endog. var.} \right)}_i} + \varepsilon_i\\[0.5em] {\color{#c5c5c5}{\text{Reduced form}}}& &\color{green}{\text{Outcome}_i} &= \pi_0 + \pi_1 \color{blue}{\text{Instrument}_i} + w_i\\[0.25em] \end{align} \]
where \(\color{red}{\widehat{\left( \text{Endog. var.} \right)}_i}\) are the predicted values (fitted values) from the first-stage regression. They only contain the variance in \(\color{red}{\left( \text{Endog. var.} \right)_i}\) that comes from \(\color{blue}{\text{Instrument}_i}\).
\[ \begin{align} {\color{#c5c5c5}{\text{Endogenous model}}}& &\color{green}{\text{Wage}_i} &= \beta_0 + \beta_1 \color{red}{\left( \text{Education} \right)_i} + u_i\\[0.5em] {\text{First stage}}& &\color{red}{\left( \text{Education} \right)_i} &= \pi_0 + \pi_1 \color{blue}{\text{Fathers Educ}_i} + v_i\\[0.25em] {\text{Second stage}}& &\color{green}{\text{Wage}_i} &= \delta_0 + \delta_1 \color{red}{\widehat{\left( \text{Education} \right)}_i} + \varepsilon_i\\[0.5em] {\color{#c5c5c5}{\text{Reduced form}}}& &\color{green}{\text{Wage}_i} &= \pi_0 + \pi_1 \color{blue}{\text{Fathers Educ}_i} + w_i\\[0.25em] \end{align} \]
where \(\color{red}{\widehat{\left( \text{Education} \right)}_i}\) are the predicted values (fitted values) from the first-stage regression. They only contain the variance in \(\color{red}{\left( \text{Education} \right)_i}\) that comes from \(\color{blue}{\text{Fathers Educ}_i}\).
Two possible instruments:
See also Angrist and Pischke (2015).
| Outcome | Policy | Unobserved stuff | Instrument |
|---|---|---|---|
| Income | Education | Ability | Father's education |
| Income | Education | Ability | Distance to college |
| Income | Education | Ability | Military draft |
| Health | Smoking cigarettes | Other negative health behaviors | Tobacco taxes |
| Crime rate | Patrol hours | # of criminals | Election cycles |
| Crime | Incarceration rate | Simultaneous causality | Overcrowding litigations |
| Labor market success | Americanization | Ability | Scrabble score of name |
| Conflicts | Economic growth | Simultaneous causality | Rainfall |

“Whenever some treatment is assigned discontinuously - people just on one side of a line get it, and people just on the other side of the line don’t, might be a little different, but not that different.” (Huntington-Klein 2021)
The following graphs come from Andrew Heiss (Program Evaluation).
Imagine: An entrance exam and those who score 70 or lower get additional training.
Imagine: A entrance exam and those who score 70 or lower get additional training.
To asses the effect of the additional training
Calculate the difference between treatment and control.
In a regression framework:
\[ y_{i} = \alpha + \delta D_{i} + \beta_1 [\text{Running - Cutoff}]_{i} + \beta_2 (D_{i} \times [\text{Running - Cutoff}]_{i}) + \varepsilon{i}, \]
\([\text{Running - Cutoff}]_{t}\) can be replace by any flexile function \(f[\text{Running - Cutoff}]_{t}\).
See Huntington-Klein (2021).
The researcher’s degree of freedom
It’s important: Lines should fit the data well!
Check higher order polynomials!
Non-parametric methods like LOESS (Locally estimated scatterplot smoothing)
Hainmueller, Hangartner, and Pietrantuono (2015):
Rüttenauer (2023)
Rüttenauer (2023)
Wagner and Petev (2019)
Comparing only treatment/control
Comparing only before/after


| Pre mean | Post mean | ∆ (post − pre) | |
|---|---|---|---|
| Control | A (never treated) |
B (never treated) |
B − A |
| Treatment | C (not yet treated) |
D (treated) |
D − C |
| ∆ (treatment − control) |
A − C | B − D | (B − A) − (D − C) or (B − D) − (A − C) |
\(\Delta\) (post − pre) = within-unit difference
| Pre mean | Post mean | ∆ (post − pre) | |
|---|---|---|---|
| Control | A (never treated) |
B (never treated) |
B − A |
| Treatment | C (not yet treated) |
D (treated) |
D − C |
| ∆ (treatment − control) |
C − A | D − B | (B − A) − (D − C) or (B − D) − (A − C) |
\(\Delta\) (treatment − control) = across-group difference
| Pre mean | Post mean | ∆ (post − pre) | |
|---|---|---|---|
| Control | A (never treated) |
B (never treated) |
B − A |
| Treatment | C (not yet treated) |
D (treated) |
D − C |
| ∆ (treatment − control) |
C − A | D − B | (D − C) − (B − A) or (D − B) − (C − A) |
\(\Delta\) within units − \(\Delta\) within groups = Difference-in-differences = causal effect!

Important:
Parallel trends assumption
“In the absence of treatment, the treatment group would have had the same trend over time than the control group”
No anticipation
“The treatment only affects the treatment group from the treatment period onwards”
The 2 \(\times\) 2 Diff-in-Diff as an interaction term:
\[ y_{it} = \alpha + \gamma D_{i} + \lambda Post_{t} + \delta_{DD} (D_{i} \times Post_{t}) + \upsilon_{it}, \]
\(\delta_{DD}\) gives the Diff-in-Diff estimator:
\[ \hat{\delta}_{DD} = \mathrm{E}(\Delta y_{T}) - \mathrm{E}(\Delta y_{C}) = [\mathrm{E}(y_{T}^{post}) - \mathrm{E}(y_{T}^{pre})] - [\mathrm{E}(y_{C}^{post}) - \mathrm{E}(y_{C}^{pre})]. \]
In settings with multiple periods, we rely on the two-ways FE estimator:
\[ y_{it} = \beta_{TWFE} D_{it} + \alpha_i + \zeta_t + \epsilon_{it}. \]
With only two periods, a binary treatment, and all observations untreated in \(t=1\),
With multiple treatment groups and periods, it is a little bit more complicated. However, in an ideal setting, the FE estimator is a weighted average of many \(2 \times 2\) DiD estimators (Goodman-Bacon 2021; Roth et al. 2023)
Richardson and Troost (2009)
Parallel trends?
Figure from Goodman-Bacon (2021)
Treatment timing
Treatment dynamic
Treatment heterogeneity
This can (in some cases) be an issue for your estimate (Goodman-Bacon and Marcus 2020).
Several new “dynamic” DiD estimators explicitly address the issue (Roth et al. 2023).
\[ \delta_{g,t} = \mathrm{E}(\Delta y_{g}) - \mathrm{E}(\Delta y_{C}) = [\mathrm{E}(y_{g}^{t}) - \mathrm{E}(y_{g}^{g-1})] - [\mathrm{E}(y_{C}^{t}) - \mathrm{E}(y_{C}^{g-1})], \]
where the control group can either be the never-treated or the not-yet-treated.
Summary measure / average (Callaway and Sant’Anna 2020):
\[ \theta_D(e) := \sum_{g=1}^G \mathbf{1} \{ g + e \leq T \} \delta(g,g+e) P(G=g | G+e \leq T), \]
where \(e\) specifies for how long a unit has been exposed to the treatment.



Rüttenauer and Best (2021)
Clark and Georgellis (2013)
Mills and Rüttenauer (2022)