Econ 474 - Econometrics of Policy Evaluation

# Econ 474 - Econometrics of Policy Evaluation
## Synthetic Control Method
### Marcelino Guerra
### April 4-6, 2022

---

# Introduction

* We are often interested in the effects of interventions at an aggregate level, such as firms, schools, cities, countries, etc. In comparative case studies, researchers base their studies on meticulous description and analysis of the characteristics of a small number of selected cases

* Those studies compare the evolution of aggregate outcomes (e.g., average income, crime rate, etc.) of a unit affected by the intervention with the outcomes of selected unaffected units  - Card and Krueger 1994, for instance

* The widespread availability of aggregate data (such as school, city, or country level) and the fact that many interventions take place at an aggregate level gives comparative case studies a broad potential. However, there is some degree of ambiguity about how comparison units are chosen

* It is important to have in mind that the inappropriate selection of comparison units may lead to erroneous conclusions

* The synthetic control method (SCM) is a data-driven procedure that reduces discretion in the choice of the comparison control units

* The idea behind the method is that a combination of units often provides a better comparison for the unit exposed to the intervention than any single unit alone (see Abadie & Gardeazabal, 2003/Abadie, Diamond & Hainmueller 2010/Abadie, Diamond & Hainmueller, 2015)

---

# Example: Expansion to Castle Doctrine in Florida

---

# Castle Doctrine in Florida

* To study the effect of the Castle doctrine law in Florida, one can use a combination of some U.S states to approximate the levels of violence that Florida would have experienced in the absence of this new law

* Because a synthetic control is a weighted average of the available control units, the SCM makes explicit 
  * The relative contribution of each control unit to the counterfactual of interest
  
  * The similarities between treated and synthetic control units in terms of preintervention outcomes and other predictors
  
* That gives transparency and reduces subjective researcher bias

]

![](figs/florida.png)
.small[**Note:** The Synthetic Control Method (SCM) is often used to study "pioneers," i.e., the first unit to experience a new intervention. For instance, Florida was the first U.S. state to pass "stand your ground" laws (during 2005). **Source:** Cheng and Hoekstra (2013).] 
]

---

# SCM - Estimation I

* Suppose there is a sample of `$j=J+1$` units, where unit `$j=1$` is the case of interest/treated unit (Florida, for instance), and the others are potential comparisons/donor pool (from `$j=2$` to `$j=J+1$`).

* Because comparison units are meant to approximate the counterfactual of the case of interest without the intervention, it is important to restrict the donor pool to units with outcomes that are thought to be driven by the same structural process as the treated unit but are not affected by the intervention during the period of study

* Assume a balanced panel data that includes a number of preintervention periods `$T_{0}$` as well as a number of postintervention periods `$T_{1}$` (hence, `$T=T_{0}+T_{1}$`). Unit 1 receives the treatment during periods `$T_{0}+1, T_{0}+2, \dots, T$`, and is not under the intervention during `$1,2,\dots, T_{0}$`

* We define a synthetic control as a weighted average of the units in the donor pool, i.e., a `$(J\times1)$` vector of weights `$W=(w_{2}, w_{3}, \dots, w_{J+1})'$` with `$0\leq w_{j} \leq 1$` for `$j=2,...,J$`, and `$w_{2}+w_{3}+\dots+w_{J+1}=1$`.

* **The weights `$W$` are selected to best resemble characteristics of the treated units**

---
# SCM - Estimation II

Let `$X_{1}$` be a `$(k \times 1)$` vector containing values of preintervention characteristics of the treated unit that we aim to match as closely as possible, and let `$X_{0}$` be the `$(k \times J)$` matrix collecting the same values of the same variables for the donor pool units.

The selected synthetic control `$W^{*}$` is the one that minimizes the difference `$X_{1}-X_{0}W$`. In other words, the synthetic control is a weighted average of available control units that approximates the most relevant characteristics of the treated unit prior to the treatment.

Let `$Y_{jt}$` be the outcome of a unit `$j$` at time `$t$`, and let `$Y_{1}$` be a `$(T_{1}\times1)$` vector collecting the post-intervention values of the outcome for the treated unit - that is, `$Y_{1}=(Y_{1, T_{0+1}}, \dots, Y_{1, T})'$`. Similarly, let `$Y_{0}$` be a `$(T_{1} \times J)$` matrix, where column `$j$` contains the post-intervention values of the outcome for the unit `$j+1$`. The synthetic control estimator is the difference of postintervention outcomes between the treated and the synthetic control unit `$Y_{1}-Y_{0}W^{*}$`. For a postintervention period t `$(t \geq T_{0})$`, the treatment effect is given by:

`$$Y_{1t}-\sum_{j=2}^{J+1}w^{*}_{j}Y_{jt}$$`

---

# Castle Doctrine in Florida via SCM I

![](figs/avg.png)

]

![](figs/synth1.png)
.small[**The data used in the R exercise can be downloaded [here](https://github.com/guerramarcelino/PolicyEval/raw/main/Datasets/castle_FL.RDS).**]

]

```r
library(Synth)
florida<-readRDS("castle_FL.RDS")
dataprep.out<-dataprep(foo=florida,
                         predictors = c("blackm_15_24", "whitem_15_24","blackm_25_44", "whitem_25_44","l_exp_subsidy","l_exp_pubwelfare", "l_police", "unemployrt","poverty","l_income","l_prisoner","l_lagprisoner"),
                         predictors.op = "mean",
                         time.predictors.prior = 2000:2004,
                         dependent = "l_homicide",
                         unit.variable="iden",
                         unit.names.variable = "state",
                         time.variable = "year",
                         treatment.identifier = 1,
                         controls.identifier = 2:30,
                         time.optimize.ssr = 2000:2004,
                         time.plot = 2000:2010)
synth.out<-synth(data.prep.obj = dataprep.out)
path.plot(synth.res = synth.out, dataprep.res = dataprep.out,
          Ylab = "Homicides (log)", Xlab = "Year",
          Ylim = c(1.5, 2), Legend = c("Florida","Synthetic Florida"), Legend.position = "bottomright")
```

]

---

# Castle Doctrine in Florida via SCM II

* After the initial drop in homicide rates in Florida, the difference between homicide rates in Florida and Synthetic Florida becomes positive and ranges from 0.05 to 0.1

* From 2000 to 2004 (pre-intervention period), the gap is very close to zero: the set of weights produced a nearly identical time path for Florida and synthetic Florida
]

![](figs/synth2.png)

]

```r
gaps.plot(synth.res = synth.out, dataprep.res = dataprep.out,
          Ylab = "Homicides (log)", Xlab = "Year",
           Ylim = c(-.3, .2), Main = NA)
```

]

---

# Castle Doctrine in Florida via SCM III

.pull-left[
* The table shows the weights of each U.S. State in the synthetic version of Florida. Synthetic Florida is a weighted version of many states, but Arkansas, Delaware, and North Carolina are the most important.

*  Vermont and New Hampshire received zero weight, while other nine states received 21 states got weights very close to zero

]

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:450px; overflow-x: scroll; width:100%; "><table class="table table-striped table-condensed" style="margin-left: auto; margin-right: auto;">
<caption>Synthetic Weights for Florida</caption>
 <thead>
  <tr>
   <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> W </th>
   <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> State </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 0.198 </td>
   <td style="text-align:left;"> Arkansas </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> California </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.003 </td>
   <td style="text-align:left;"> Colorado </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Connecticut </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.123 </td>
   <td style="text-align:left;"> Delaware </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.024 </td>
   <td style="text-align:left;"> Wyoming </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Hawaii </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Idaho </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.005 </td>
   <td style="text-align:left;"> Illinois </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Iowa </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Maine </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Maryland </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Massachusetts </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Minnesota </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Nebraska </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.052 </td>
   <td style="text-align:left;"> Nevada </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.000 </td>
   <td style="text-align:left;"> New Hampshire </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.005 </td>
   <td style="text-align:left;"> New Jersey </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> New Mexico </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.092 </td>
   <td style="text-align:left;"> New York </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.470 </td>
   <td style="text-align:left;"> North Carolina </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Oregon </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Pennsylvania </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Rhode Island </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Utah </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.000 </td>
   <td style="text-align:left;"> Vermont </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.002 </td>
   <td style="text-align:left;"> Virginia </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> Washington </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0.003 </td>
   <td style="text-align:left;"> Wisconsin </td>
  </tr>
</tbody>
</table></div>

]
]

```r
synth.tables <- synth.tab(dataprep.res = dataprep.out,synth.res = synth.out)
synth.tables$tab.w
```

]

---

# Castle Doctrine in Florida via SCM IV

.pull-left2[ * The balance table compares pre-intervention characteristics of Florida, the whole donor pool (`Sample.Mean`), and synthetic Florida

* Overall, the results suggest that synthetic Florida provides a better comparison for Florida than the average of the entire sample (the other 29 states)

]

<div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:450px; overflow-x: scroll; width:100%; "><table class="table table-striped table-condensed" style="margin-left: auto; margin-right: auto;">
<caption>Balance Table</caption>
 <thead>
  <tr>
   <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Variables </th>
   <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Treated </th>
   <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Synthetic </th>
   <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Sample.Mean </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> blackm_15_24 </td>
   <td style="text-align:right;"> 1.25 </td>
   <td style="text-align:right;"> 0.64 </td>
   <td style="text-align:right;"> 2.90 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> whitem_15_24 </td>
   <td style="text-align:right;"> 2.16 </td>
   <td style="text-align:right;"> 2.80 </td>
   <td style="text-align:right;"> 10.51 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> blackm_25_44 </td>
   <td style="text-align:right;"> 2.26 </td>
   <td style="text-align:right;"> 1.02 </td>
   <td style="text-align:right;"> 5.15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> whitem_25_44 </td>
   <td style="text-align:right;"> 4.94 </td>
   <td style="text-align:right;"> 5.68 </td>
   <td style="text-align:right;"> 23.22 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_exp_subsidy </td>
   <td style="text-align:right;"> 4.49 </td>
   <td style="text-align:right;"> 4.50 </td>
   <td style="text-align:right;"> 4.73 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_exp_pubwelfare </td>
   <td style="text-align:right;"> 6.79 </td>
   <td style="text-align:right;"> 7.02 </td>
   <td style="text-align:right;"> 7.08 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_police </td>
   <td style="text-align:right;"> 6.03 </td>
   <td style="text-align:right;"> 5.86 </td>
   <td style="text-align:right;"> 5.71 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> unemployrt </td>
   <td style="text-align:right;"> 4.84 </td>
   <td style="text-align:right;"> 5.18 </td>
   <td style="text-align:right;"> 4.74 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> poverty </td>
   <td style="text-align:right;"> 12.70 </td>
   <td style="text-align:right;"> 12.95 </td>
   <td style="text-align:right;"> 10.59 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_income </td>
   <td style="text-align:right;"> 10.75 </td>
   <td style="text-align:right;"> 10.77 </td>
   <td style="text-align:right;"> 10.91 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_prisoner </td>
   <td style="text-align:right;"> 6.16 </td>
   <td style="text-align:right;"> 6.14 </td>
   <td style="text-align:right;"> 5.84 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> l_lagprisoner </td>
   <td style="text-align:right;"> 6.14 </td>
   <td style="text-align:right;"> 6.13 </td>
   <td style="text-align:right;"> 5.83 </td>
  </tr>
</tbody>
</table></div>
]

]

```r
synth.tables$tab.pred
```

]

---

# The Economic Cost of the 1990 German Reunification

---

# The 1990 German Reunification

.pull-left3[* The 1990 German reunification is one of the most significant political events in postwar Europe. Almost a year after the fall of the Berlin Wall, the German Democratic Republic (East Germany) and the Federal Republic of Germany (West Germany) officially reunited

* Given the significant income disparity between East and West Germany, the integration of both countries called for political and economic adjustments. One can see the reunification as a case study to examine the economic consequences of political integration

* Abadie, Diamond, and Hainmueller (2015)<sup>1</sup> focus on the consequences of the reunification for the West Germany economy, precisely the magnitude of these adverse effects on per capita GDP  
]

![](figs/wall.jpg)

]

.footnote[ [1] Alberto Abadie, Alexis Diamond, and Jens Hainmueller. "Comparative politics and the synthetic control method." *American Journal of Political Science*, 2015]

---

# Building a Synthetic West Germany I

.pull-left2[
* The panel data comprehends West Germany and other 16 OECD member countries through the 1960-2003 period. Synthetic West Germany is constructed as a weighted average of those 16 countries in the donor pool

* The outcome of interest `$Y_{jt}$` is real per capita GDP Purchasing Power Parity adjusted and measured in 2002 U.S. dollars

* The pre-reunification characteristics in `$X_{1}$` and `$X_{0}$` are a standard set of economic growth predictors: per capita GDP, inflation rate, industry share of value-added, investment rate, schooling, and trade openness

]

The table shows the weights of countries that contributed to the synthetic version of West Germany:

<table class="table table-striped table-condensed" style="font-size: 21px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> Synthetic Weights </th>
   <th style="text-align:left;"> Country </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.42 </td>
   <td style="text-align:left;"> Austria </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.16 </td>
   <td style="text-align:left;"> Japan </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.09 </td>
   <td style="text-align:left;"> Netherlands </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.11 </td>
   <td style="text-align:left;"> Switzerland </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.22 </td>
   <td style="text-align:left;"> United States </td>
  </tr>
</tbody>
</table>
.small[**Source:** Abadie, Diamond and Hainmueller (2015). Australia, Belgium, Denmark, France, Greece, Italy, New Zealand, Norway, Portugal, Spain and United Kingdom received zero weight]

]

---

# Building a Synthetic West Germany II

* The table compares the pre-reunification characteristics of West Germany to those of synthetic West Germany and also to those of a population-weighted average of the 16 OECD countries in the donor pool

* Overall, Synthetic West Germany has very similar characteristics to West Germany and provides a much better comparison than a sample of OECD countries

]

<table class="table table-striped table-condensed" style="font-size: 21px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> West Germany </th>
   <th style="text-align:right;"> Synthetic West Germany </th>
   <th style="text-align:right;"> OECD Sample </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> GDP per capita </td>
   <td style="text-align:right;"> 15808.9 </td>
   <td style="text-align:right;"> 15802.2 </td>
   <td style="text-align:right;"> 8021.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Trade openness </td>
   <td style="text-align:right;"> 56.8 </td>
   <td style="text-align:right;"> 56.9 </td>
   <td style="text-align:right;"> 31.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Inflation rate </td>
   <td style="text-align:right;"> 2.6 </td>
   <td style="text-align:right;"> 3.5 </td>
   <td style="text-align:right;"> 7.4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Industry share </td>
   <td style="text-align:right;"> 34.5 </td>
   <td style="text-align:right;"> 34.4 </td>
   <td style="text-align:right;"> 34.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Schooling </td>
   <td style="text-align:right;"> 55.5 </td>
   <td style="text-align:right;"> 55.2 </td>
   <td style="text-align:right;"> 44.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Investment rate </td>
   <td style="text-align:right;"> 27.0 </td>
   <td style="text-align:right;"> 27.0 </td>
   <td style="text-align:right;"> 25.9 </td>
  </tr>
</tbody>
</table>
.small[**Source:** Abadie, Diamond and Hainmueller (2015). Averages for the 1981-1990 period.]

]

---

# The Economic Cost of the 1990 Reunification I

![](figs/fig1.png)
.small[**Note:** Trends in per capita GDP: West Germany versus the rest of OECD countries. Abadie, Diamond and Hainmueller (2015).]

]

![](figs/fig2.png)
.small[**Note:** Trends in per capita GDP: West Germany versus Synthetic West Germany. Abadie, Diamond and Hainmueller (2015).]

]

---

# The Economic Cost of the 1990 Reunification II

* The estimate of the economic cost of the 1990 reunification is given by the difference between the per capita GDP of the real West Germany and its synthetic version
  * There is an initial growth that the demand boost might explain right after the reunification
  * After 1992, the gap between real West Germany and its synthetic version turns negative, and that difference grows until the end of the sample period

* This close fit before the treatment and the balance between GDP predictors demonstrate that there exists a combination of other industrialized countries that reproduces the economic attributes of West Germany before the reunification

]

![](figs/fig3.png)
.small[**Note:** Per capita GDP gap between West Germany and Synthetic West Germany. Abadie, Diamond and Hainmueller (2015).]

]

---
# Placebo Studies I

* To evaluate the credibility of the results, the authors run a series of placebo studies. The first test is to check whether there is any effect when using a different time of treatment (placebo date)

* The authors reassign the treatment time to 1975, about 15 years earlier than the actual German reunification. The idea behind that is if there is a significant placebo estimate, that will undermine the confidence in the results. As one can see, the 1975 placebo reunification gives no perceivable gap between West Germany and Synthetic West Germany

]

![](figs/fig6.png)

]

---

# Placebo Studies II

* Another way to conduct placebo studies is to reassign the treatment in the data to a comparison unit. Hence, one can obtain synthetic control estimates for countries that did not experience the intervention (in-space placebos)

* Applying that to each country in the donor pool allows us to compare the estimated effect of the reunification on West Germany to the distribution of placebo effects obtained for other countries 
  
  * If the estimated treatment effect for West Germany is unusually large relative to the distribution of placebo effects, that is a sign that the gap between West Germany and Synthetic West Germany is not by mere chance. On the other hand, if you see similar or larger treatment effect estimates when the intervention is artificially reassigned, that compromises the confidence in your results

* For statistical inference, the authors recommend calculating a set of root mean squared prediction error (RMSPE) values for the pre and post-treatment period

`$$RMSPE=(\frac{1}{T-T_{0}}\sum_{t=T_{0}+t}^{T}(Y_{1t}-\sum_{j=2}^{J+1}w_{j}^{*}Y_{jt})^{2})^{\frac{1}{2}}$$`

---

# Placebo Studies III

* After applying the synthetic control method to all countries in the donor pool, calculate the RMSPE for every unit before and after the treatment. Then, compute the ratio of the post to pre-treatment RMSPE

* Sort this ratio in descending order from the greatest to highest (the figure shows those ratios)

* Calculate the "exact p-value" dividing the rank of the treated unit by the total number of units. In this case, `$\frac{1}{17} \approx 0.059$`

]

![](figs/fig4.png)

]