The Tennessee STAR experiment [30 points]

“Education production” is a topic much explored by economists. The terminology reflects that we think of school environment features as inputs that cost money, while student learning is the output that schools produce. A major question in the field is which inputs have the highest benefit/cost ratio, and one very costly input is class size - most likely, you~ll need to hire more teachers to reduce class size. An important experiment conducted in Tennessee was designed to precisely answer the question “Does class size impacts student performance?”.

Krueger (1999) analyzed the Project STAR, a longitudinal study that randomly assigned kindergarten students and their teachers to one of three groups beginning in the 1985–1986 school year. The three groups were small classes (13–17 students per teacher), regular-size classes (22–25 students), and regular/aide classes (22–25 students) which also included a full-time teacher’s aide. After their initial assignment, the design called for students to remain in the same class type for four years. Around 6000–7000 students were involved in the project each year. You can find part of the sample related to students who entered STAR in kindergarten here to answer the following questions.

Create the dummy variables Free_lunch (takes 1 if lunch is “free”), White_asian (equal 1 if ethnicity is either “cauc” or “asian”) and Female - takes 1 if gender is “female”. Also, define the variable age as 1986-birth, i.e., compute the age of the children in 1986.
The first question to ask about a randomized experiment is whether the randomization successfully balanced the subject’s characteristics across different groups. Although the STAR data failed to include any pretreatment test scores, we can look at some characteristics of students such as race, gender, age, and free lunch status, which is a good measure of family income since only poor children qualify for free school lunch. Compare the values of Free_lunch, White_asian, Female, and age across the three groups small, regular, regular+aide. Are those variables balanced?
One way to get the causal effect of interest is to run a regression of the outcome score (in percentage points) on the treatment classtype. Run a regression of score on classtype using robust standard errors and explain the results.

Hint: Use feols() from the fixest package setting se="hetero".

a)

setwd("D:/OneDrive - University of Illinois - Urbana/Causal Inference/ECON 474 Spring 2022/HW/HW2")
library(tidyverse)
star<-readRDS("STARkinder.RDS")

## Dummy Free lunch
star$Free_lunch<-ifelse(star$lunch=="free",1,0)
star$White_asian<-ifelse(star$ethnicity=="cauc"|star$ethnicity=="asian",1,0)
star$Female<-ifelse(star$gender=="female",1,0)
## Got an error using 1986-star$birth
star$age=-(star$birth-1986)

b)

star%>%group_by(classtype)%>%summarize(`Free lunch`=mean(Free_lunch), 
                                        `White/Asian`=mean(White_asian),
                                         Female=mean(Female),
                                        Age=mean(age))

## # A tibble: 3 x 5
##   classtype    `Free lunch` `White/Asian` Female   Age
##   <chr>               <dbl>         <dbl>  <dbl> <dbl>
## 1 regular             0.479         0.678  0.487  5.88
## 2 regular+aide        0.499         0.663  0.486  5.89
## 3 small               0.473         0.685  0.486  5.89

One option is to split the dataset into two: one with regular and small classes, and the other with regular and regular+aide classes. Then, it is straightforward to apply the function t.test().

reg_small<-star%>%filter(classtype=="regular"|classtype=="small")

lapply(reg_small[,c('Free_lunch', 'White_asian', 'Female', 'age')], 
       function(x) t.test(x ~ reg_small$classtype))

## $Free_lunch
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_small$classtype
## t = 0.36147, df = 3642.1, p-value = 0.7178
## alternative hypothesis: true difference in means between group regular and group small is not equal to 0
## 95 percent confidence interval:
##  -0.02629397  0.03818111
## sample estimates:
## mean in group regular   mean in group small 
##             0.4785028             0.4725592 
## 
## 
## $White_asian
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_small$classtype
## t = -0.4469, df = 3646.9, p-value = 0.655
## alternative hypothesis: true difference in means between group regular and group small is not equal to 0
## 95 percent confidence interval:
##  -0.03691611  0.02321084
## sample estimates:
## mean in group regular   mean in group small 
##             0.6783005             0.6851531 
## 
## 
## $Female
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_small$classtype
## t = 0.010441, df = 3641.5, p-value = 0.9917
## alternative hypothesis: true difference in means between group regular and group small is not equal to 0
## 95 percent confidence interval:
##  -0.03209383  0.03243747
## sample estimates:
## mean in group regular   mean in group small 
##             0.4865959             0.4864240 
## 
## 
## $age
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_small$classtype
## t = -0.72547, df = 3637.4, p-value = 0.4682
## alternative hypothesis: true difference in means between group regular and group small is not equal to 0
## 95 percent confidence interval:
##  -0.03062459  0.01408210
## sample estimates:
## mean in group regular   mean in group small 
##              5.884421              5.892692

reg_regaide<-star%>%filter(classtype=="regular"|classtype=="regular+aide")

lapply(reg_regaide[,c('Free_lunch', 'White_asian', 'Female', 'age')], 
       function(x) t.test(x ~ reg_regaide$classtype))

## $Free_lunch
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_regaide$classtype
## t = -1.3114, df = 3988.7, p-value = 0.1898
## alternative hypothesis: true difference in means between group regular and group regular+aide is not equal to 0
## 95 percent confidence interval:
##  -0.05177769  0.01027209
## sample estimates:
##      mean in group regular mean in group regular+aide 
##                  0.4785028                  0.4992556 
## 
## 
## $White_asian
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_regaide$classtype
## t = 1.0597, df = 3989.8, p-value = 0.2893
## alternative hypothesis: true difference in means between group regular and group regular+aide is not equal to 0
## 95 percent confidence interval:
##  -0.01340565  0.04494453
## sample estimates:
##      mean in group regular mean in group regular+aide 
##                  0.6783005                  0.6625310 
## 
## 
## $Female
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_regaide$classtype
## t = 0.015386, df = 3988.6, p-value = 0.9877
## alternative hypothesis: true difference in means between group regular and group regular+aide is not equal to 0
## 95 percent confidence interval:
##  -0.03078453  0.03127152
## sample estimates:
##      mean in group regular mean in group regular+aide 
##                  0.4865959                  0.4863524 
## 
## 
## $age
## 
##  Welch Two Sample t-test
## 
## data:  x by reg_regaide$classtype
## t = -0.4695, df = 3989.5, p-value = 0.6387
## alternative hypothesis: true difference in means between group regular and group regular+aide is not equal to 0
## 95 percent confidence interval:
##  -0.02669355  0.01637890
## sample estimates:
##      mean in group regular mean in group regular+aide 
##                   5.884421                   5.889578

The other option is to perform ANOVA and compare all the three means together:

lapply(star[,c('Free_lunch', 'White_asian', 'Female', 'age')], function(x) anova(lm(x ~ star$classtype)))

## $Free_lunch
## Analysis of Variance Table
## 
## Response: x
##                  Df  Sum Sq Mean Sq F value Pr(>F)
## star$classtype    2    0.76 0.37764  1.5121 0.2205
## Residuals      5720 1428.53 0.24974               
## 
## $White_asian
## Analysis of Variance Table
## 
## Response: x
##                  Df  Sum Sq Mean Sq F value Pr(>F)
## star$classtype    2    0.51 0.25654  1.1689 0.3108
## Residuals      5720 1255.33 0.21946               
## 
## $Female
## Analysis of Variance Table
## 
## Response: x
##                  Df Sum Sq  Mean Sq F value Pr(>F)
## star$classtype    2    0.0 0.000031   1e-04 0.9999
## Residuals      5720 1429.7 0.249948               
## 
## $age
## Analysis of Variance Table
## 
## Response: x
##                  Df Sum Sq  Mean Sq F value Pr(>F)
## star$classtype    2   0.07 0.032667  0.2712 0.7624
## Residuals      5720 688.90 0.120437

These tests lead to the same result: the characteristics of students are balanced across treatment/control groups.

c)

library(fixest)
reg1<-feols(score~classtype, se="hetero", data=star)
etable(reg1)

##                                    reg1
## Dependent Var.:                   score
##                                        
## (Intercept)           51.44*** (0.6182)
## classtyperegular+aide  -0.1900 (0.8491)
## classtypesmall        4.835*** (0.9046)
## _____________________ _________________
## S.E. type             Heteroskeda.-rob.
## Observations                      5,723
## R2                              0.00699
## Adj. R2                         0.00665
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The STAR experiment shows that attending a small class raises test scores by about 4.8 percentage points. Also, students assigned to regular+aide classes did not experience any improvement in scores.

CIA and Regression

Schools with at least three classes in each grade could choose to enroll in the experiment. Hence, students were randomized but not schools, and that might be a source of bias. One way to wash out the bias is to control for schoolid. Run a regression of the outcome score (in percentage points) on the treatment classtype, controlling for schoolid and using robust standard errors. How do these results compare to what you got in (c)?
Run a regression of the outcome score (in percentage points) on the treatment classtype, controlling for schoolid, Free_lunch, White_asian, female, and experience. Use robust standard errors. Did the estimates related to small and regular+aide change when you added covariates? Why/Why not?
Do your new results represent the causal effect of class size on students’ scores? Why/Why not?

d)

The coefficient on classtype small is now bigger, and the s.e. is now smaller.

reg1<-feols(score~classtype, se="hetero", data=star)
reg2<-feols(score~classtype|schoolid, se="hetero", data=star)
etable(reg1, reg2, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                                    reg1              reg2
## Dependent Var.:                   score             score
##                                                          
## (Intercept)           51.44*** (0.6182)                  
## classtyperegular+aide  -0.1900 (0.8491)   0.2939 (0.7444)
## classtypesmall        4.835*** (0.9046) 5.593*** (0.8046)
## Fixed-Effects:        ----------------- -----------------
## schoolid                             No               Yes
## _____________________ _________________ _________________
## S.E. type             Heteroskeda.-rob. Heteroskeda.-rob.
## Observations                      5,723             5,723
## R2                              0.00699           0.25329
## Within R2                            --           0.01103
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

e)

The coefficient on classtype small changed just a bit, and that was expected since the treatment was randomly assigned.

reg3<-feols(score~classtype+Free_lunch+White_asian+Female+experience|schoolid, se="hetero", data=star)

etable(reg2, reg3, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                                    reg2               reg3
## Dependent Var.:                   score              score
##                                                           
## classtyperegular+aide   0.2939 (0.7444)    0.2069 (0.7104)
## classtypesmall        5.593*** (0.8046)  5.505*** (0.7674)
## Free_lunch                              -13.23*** (0.7411)
## White_asian                               9.480*** (1.279)
## Female                                   4.732*** (0.6025)
## experience                              0.2536*** (0.0588)
## Fixed-Effects:        ----------------- ------------------
## schoolid                            Yes                Yes
## _____________________ _________________ __________________
## S.E. type             Heteroskeda.-rob. Heteroskedas.-rob.
## Observations                      5,723              5,723
## R2                              0.25329            0.31807
## Within R2                       0.01103            0.09682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

f)

The main assumption for causal effects of class size on students’ scores in this framework is conditional independence, which is credible. We know that the researchers randomly assigned students to different sizes of classes. Although the schools were not randomly selected, controlling for school id is enough for an “as good as random” treatment.

Omitted Variable Bias [30 points]

The standard economic theory of criminal behavior frames the engagement in illicit activities as a result of a rational choice, a decision that considers costs and benefits. The straightforward prediction of this theory is that people respond to changes in the expected costs and benefits of committing a crime. For instance, if there are more police on the streets or better police intelligence, that would increase the probability of arrest \((P_{A})\) faced by potential offenders. The increase in \(P_{A}\) generates a drop in crime rates since it increases the cost of committing a crime/decreases the expected utility of crime. This behavioral response of individuals is known as a deterrent effect.

In this question, we’ll use part of the dataset from Cornwell and Trumbuil (1993). Download the .RDS file here. The authors have data from counties in North Carolina from 1981 to 1987, and their goal was to estimate by how much crime would go down with an increase in the probability of arrest \((P_{A})\), i.e., the elasticity of crime with respect to \(P_{A}\).

Read the dataset CT1993.RDS and get the average crime rate crmte and the average probability of arrest prbarr for each year from 1981 to 1987. Is there any clear pattern between those variables over time across North Carolina’s counties?

Hint: you want to gropu_by(year) and summarize() the data.

Filter the data by year==1987. After that, construct a scatterplot with lcrmrte (log crime per person) in the y-axis and prbarr (probability of arrest) in the x-axis. How do those two variables relate?
Estimate the following model by OLS with robust s.e. using the 1987 data:

\[lcrmrte_{i}=\beta_{0}+\beta_{1}prbarr_{i}+v_{i}\] and interpret the results.

Estimate the following model by OLS with robust s.e. using the 1987 data:

\[lcrmrte_{i}=\gamma_{0}+\gamma_{1}prbarr_{i}+\gamma_{2}density_{i}+\eta_{i}\] where density is the population density (number of people per sq. mile). Why are the coefficients related to prbarr different in the two regressions?

Estimate the following model by OLS using the 1987 data:

\[density_{i}=\theta_{0}+\theta_{1}prbarr_{i}+\varepsilon_{i}\] and show how \(\theta_{1}\) relates to \(\beta_{1}\) and \(\gamma_{1}\).

Is either of the regressions likely to provide the true causal effect of the probability of arrest on crime rates? Why/Why not?

xkcd 552: Correlation

a)

Although higher values of Probability of Arrest appear to be followed by slightly lower Crime Rate values, there is no strong pattern between those variables over time.

ct93<-readRDS("CT1993.RDS")

library(tidyverse)
avg<-ct93%>%
  group_by(year)%>%
  summarise(`Crime Rate`=mean(crmrte), 
            `Probability of Arrest`=mean(prbarr))

library(kableExtra)
kbl(avg) %>%
kable_paper(full_width = F)%>%
kable_styling(latex_options = "striped", position = "center", font_size = 20)

year	Crime Rate	Probability of Arrest
81	0.0327502	0.2990574
82	0.0326482	0.3115709
83	0.0307077	0.3281419
84	0.0294517	0.3178034
85	0.0297594	0.3040093
86	0.0322861	0.2957571
87	0.0335099	0.2952375

b)

Crime rates between North Carolina counties in 1987 appear to be negatively correlated with probability of arrest.

crime87<-ct93%>%filter(year==87)

library(ggthemes)
ggplot(crime87, aes(x=prbarr, y=lcrmrte)) + geom_point(color="#91b8bd") +
  stat_smooth(method = "lm", formula =y~x, se=F, color="#336666") +
  scale_x_continuous(name = "Probability of Arrest") +
  scale_y_continuous(name = "Log Crime Rates") +
  theme_economist(base_size = 17)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12,face="bold"))

c)

The interpretation here is tricky. Note that prbarr is a ratio, and the dependent variable is in log. Hence, if the probability of arrest increases by 0.01, the crime rate per person decreases by 1.188%.

The probability of arrest increasing by 0.01 does not mean a 1% increase in the probability of arrest. To interpret results as elasticities, take the log() of prbarr and run the regression lcrmrte on lprbarr (check results of reg2).

reg1<-feols(lcrmrte~prbarr,se="hetero", data=crime87)
reg2<-feols(lcrmrte~log(prbarr),se="hetero", data=crime87)
etable(reg1,reg2, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                               reg1                reg2
## Dependent Var.:            lcrmrte             lcrmrte
##                                                       
## (Intercept)     -2.985*** (0.1200)  -4.316*** (0.1855)
## prbarr          -1.884*** (0.3015)                    
## log(prbarr)                        -0.5936*** (0.1460)
## _______________ __________________ ___________________
## S.E. type       Heteroskedas.-rob. Heteroskedast.-rob.
## Observations                    90                  90
## R2                         0.22351             0.18988
## Adj. R2                    0.21468             0.18068
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

d)

The coefficients are different because you added other covariate - prbarr is correlated with population density, and population density has a bite on crime rates.

reg2<-feols(lcrmrte~prbarr+density,se="hetero", data=crime87)
etable(reg1,reg2, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                               reg1               reg2
## Dependent Var.:            lcrmrte            lcrmrte
##                                                      
## (Intercept)     -2.985*** (0.1200) -3.455*** (0.1417)
## prbarr          -1.884*** (0.3015) -1.239*** (0.3517)
## density                            0.1944*** (0.0247)
## _______________ __________________ __________________
## S.E. type       Heteroskedas.-rob. Heteroskedas.-rob.
## Observations                    90                 90
## R2                         0.22351            0.48721
## Adj. R2                    0.21468            0.47543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

e)

\(\beta_{1}=\gamma_{1}+\text{bias}=\gamma_{1}+ \gamma_{2}\times\theta_{1} \Rightarrow-1.884=-1.239+ 0.1944\times-3.317\)

reg3<-feols(density~prbarr,se="hetero", data=crime87)
etable(reg1,reg2,reg3, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                               reg1               reg2              reg3
## Dependent Var.:            lcrmrte            lcrmrte           density
##                                                                        
## (Intercept)     -2.985*** (0.1200) -3.455*** (0.1417) 2.417*** (0.4417)
## prbarr          -1.884*** (0.3015) -1.239*** (0.3517)  -3.317** (1.156)
## density                            0.1944*** (0.0247)                  
## _______________ __________________ __________________ _________________
## S.E. type       Heteroskedas.-rob. Heteroskedas.-rob. Heteroskeda.-rob.
## Observations                    90                 90                90
## R2                         0.22351            0.48721           0.09032
## Adj. R2                    0.21468            0.47543           0.07998
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

f)

No! First, there are many other missing variables in both regressions - covariates that are correlated with the probability of arrest and also influence the outcome crime rates. Also, one needs to deal with the fact that crime rates might also affect the probability of arrest (the simultaneity problem).

Propensity Score Matching [40 points]

In Brazil, there is a growing interest in the effect of studying in a Military School versus going to a regular public school. To attribute differences in student achievement to the type of school attended is complicated because students are not randomly allocated to Military or other public institutions. It is important to mention that there are two ways to enroll in a Military school: the student can either take a highly competitive entrance exam or attend a Military school for having a parent in the Army.

The dataset here ¹ contains (simulated) information about 3,704 students who attended different schools in Fortaleza-CE. At the end of Middle school, students from various institutions take a standardized exam. Your job is to get the causal effect of Military school attendance on students’ scores.

The variables are described below:

Variables Description
Variable	Definition
id	Students’s Identification
test_score	Standardized Test Scores
military	1 if attended a Military school, 0 otherwise
white	1 if student is white, 0 otherwise
income	Student’s family income
mothers_edu	1 if mother’s education level is high school or below, 0 if some college or above
mothers_age	Age of mother

What is the result of a naive comparison of average test_score between the treatment (Military school attendance) and control groups (other public schools)? Is that difference statistically significant?

Hint: use the function read_csv() from tidyverse to import the dataset.

Run t-tests to check for balance between treatment and control groups for all the covariates income, white, mothers_edu, and mothers_age. Are those variables balanced?
Based on your answer for b), can you guess the direction of the bias? i.e., do you think that the naive comparison using non-experimental data underestimates or overestimates the causal effect of Military school attendance? Explain.

a)

The naive comparison gives an average treatment effect of 0.1584, i.e., the average score of students who went to military schools (0.299) is higher than the average test score of students who attended regular public schools (0.141). That difference is statistically significant (p-value<0.01).

data<-read_csv("military.csv")

data%>%group_by(military)%>%summarize(outcome=mean(test_score), income=mean(income), white=mean(white), mothers_educ=mean(mothers_edu),mothres_age=mean(mothers_age), obs=n())

## # A tibble: 2 x 7
##   military outcome income white mothers_educ mothres_age   obs
##      <dbl>   <dbl>  <dbl> <dbl>        <dbl>       <dbl> <int>
## 1        0   0.141 77790. 0.760        0.192        39.4  2920
## 2        1   0.299 89669. 0.796        0.153        40.2   784

t.test(test_score~military, data=data)

## 
##  Welch Two Sample t-test
## 
## data:  test_score by military
## t = -4.3489, df = 1304.9, p-value = 1.475e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.22988578 -0.08695741
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1410667       0.2994883

b)

The socioeconomic/demographic variables are not balanced between groups. On average, the control group has mothers with more education years than the treatment group. On the other hand, age, the share of whites, and income are higher in the treatment group.

lapply(data[,c('income', 'white', 'mothers_edu', 'mothers_age')], 
       function(x) t.test(x ~ data$military))

## $income
## 
##  Welch Two Sample t-test
## 
## data:  x by data$military
## t = -6.814, df = 1221.9, p-value = 1.487e-11
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -15299.218  -8458.764
## sample estimates:
## mean in group 0 mean in group 1 
##        77789.90        89668.89 
## 
## 
## $white
## 
##  Welch Two Sample t-test
## 
## data:  x by data$military
## t = -2.1903, df = 1294.4, p-value = 0.02868
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.068219396 -0.003754325
## sample estimates:
## mean in group 0 mean in group 1 
##       0.7599315       0.7959184 
## 
## 
## $mothers_edu
## 
##  Welch Two Sample t-test
## 
## data:  x by data$military
## t = 2.6185, df = 1329.1, p-value = 0.008933
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.00971084 0.06772835
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1917808       0.1530612 
## 
## 
## $mothers_age
## 
##  Welch Two Sample t-test
## 
## data:  x by data$military
## t = -3.9161, df = 1431, p-value = 9.424e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.0972422 -0.3648559
## sample estimates:
## mean in group 0 mean in group 1 
##        39.42329        40.15434

c)

I would guess that the naive comparison overestimates the true causal effect of Military school attendance on test scores. Students who perform well on the entrance exam are the ones being treated in the first place, and they are more likely to do better in school anyway. Also, the treatment group has a much higher household income than the control.

Regression with multiple covariates

Run a regression adding the covariates income, white, mothers_edu, and mothers_age beside the treatment variable military. Is the effect of Military school attendance different using regression and using a simple difference in means? Why?

d)

Yes! The naive comparison does not consider important differences in the characteristics of students in the treatment and control groups. When you control for all the variables available, you are (most likely) one step closer to an apples-to-apples comparison: the treatment effect decreases by more than 50%.

reg1<-feols(test_score~military, se="hetero",data=data)
reg2<-feols(test_score~military+white+income+mothers_edu+mothers_age, se="hetero",data=data)

etable(reg1,reg2, signifCode = c("***"=0.01, "**"=0.05))

##                               reg1                 reg2
## Dependent Var.:         test_score           test_score
##                                                        
## (Intercept)     0.1411*** (0.0177)  -0.9398*** (0.1216)
## military        0.1584*** (0.0364)     0.0734* (0.0354)
## white                                0.2980*** (0.0367)
## income                             4.32e-6*** (3.58e-7)
## mothers_edu                         -0.3154*** (0.0382)
## mothers_age                          0.0147*** (0.0030)
## _______________ __________________ ____________________
## S.E. type       Heteroskedas.-rob. Heteroskedasti.-rob.
## Observations                 3,704                3,704
## R2                         0.00471              0.09276
## Adj. R2                    0.00444              0.09154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Nearest-Neighbor Matching

Run the logit model to get the predicted probabilities of getting treated given the observed covariates. Plot the distributions of the propensity score in the treated and control groups. What do you see? Can we rely on the existence of common support?
Restrict the sample finding the nearest neighbor of each treated student. Then, run two regressions (one with and the other without covariates) with the matched data. Contrast the results you got from the naive comparison and regression with the full sample. What assumptions do you rely on to claim that this is the true causal effect of Military school attendance on scores?

e)

There is a reasonable overlap between distributions of propensity scores in control and treatment groups, so the lack of common support would not be an issue in this case.

logit_reg <- glm( military ~ income+white+mothers_edu+mothers_age, data = data,
                  family = binomial(link = 'logit'))


probs <- data.frame(prop_score = predict(logit_reg, type = "response"),
                    review = logit_reg$model$military)


library(patchwork)

hist1<-probs  %>% 
  filter(review == 0) %>% 
  ggplot() +
  geom_histogram(aes(x = prop_score))+
  xlab("Prob. of going to Military School: control")+
  theme_bw()


hist2<-probs  %>% 
  filter(review== 1) %>% 
  ggplot() +
  geom_histogram(aes(x = prop_score))+
  xlab("Prob. of going to Military School: treated")+
  theme_bw()

hist1+hist2

f)

Using MatchIt to find the nearest neighbor for each treated student, you get a dataset with 1568 observations - one student in the control group for each of those 784 students in the treatment group.

library(MatchIt)
matching <- matchit(military ~ income+white+mothers_edu+mothers_age,
                 data = data, method = "nearest", 
                 distance = "logit")

m_data <- match.data(matching)
dim(m_data)

## [1] 1568   10

The matched sample is now balanced in all variables:

m_data%>%group_by(military)%>%summarize(outcome=mean(test_score), income=mean(income), white=mean(white), mothers_educ=mean(mothers_edu),mothres_age=mean(mothers_age), obs=n())

## # A tibble: 2 x 7
##   military outcome income white mothers_educ mothres_age   obs
##      <dbl>   <dbl>  <dbl> <dbl>        <dbl>       <dbl> <int>
## 1        0   0.219 89615. 0.796        0.147        40.1   784
## 2        1   0.299 89669. 0.796        0.153        40.2   784

Regression results with and without controls are similar. To claim the causal effect of Military school attendance on scores, one needs to assume conditional independence: the treatment is “as good as random” after you condition it on the variables you observe - income, race, mother’s education, and age.

reg1_NE<-feols(test_score~military, se="hetero",data=m_data)
reg2_NE<-feols(test_score~military+white+income+mothers_edu+mothers_age, se="hetero",data=m_data)

etable(reg1_NE,reg2_NE, signifCode = c("***"=0.01, "**"=0.05))

##                            reg1_NE              reg2_NE
## Dependent Var.:         test_score           test_score
##                                                        
## (Intercept)     0.2188*** (0.0328)   -1.084*** (0.2059)
## military          0.0806. (0.0457)     0.0814. (0.0440)
## white                                0.2368*** (0.0563)
## income                             3.67e-6*** (5.28e-7)
## mothers_edu                         -0.3327*** (0.0565)
## mothers_age                          0.0208*** (0.0049)
## _______________ __________________ ____________________
## S.E. type       Heteroskedas.-rob. Heteroskedasti.-rob.
## Observations                 1,568                1,568
## R2                         0.00198              0.07748
## Adj. R2                    0.00134              0.07452
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Weighting on the Propensity Score

Trim the data based on the propensity score dropping observations with p-score<0.1 and p-score>0.9. Then run the weighted regression using the IPW as weights. Contrast the results you got with the ones from the nearest neighbor.
Is any of those regressions likely to provide the true causal effect of Military school attendance on student scores? Why/Why not?

g)

Results are similar using IPW with and without controls. Compared to the ones we got using Nearest-neighbor matching, the treatment effect appears to be slightly bigger when weighing the propensity score. On average, test scores in the treatment group are 0.08-0.09 higher than the control group scores.

data$prop_scores<-logit_reg$fitted.values
data<-data%>%filter(prop_scores>=.1 &prop_scores<=.9 ) #### there are no prop_scores <0.1 and >0.9 
## Generating the IPW
data<-data %>% 
  mutate(ipw = (military / prop_scores) + ((1 - military) / (1 - prop_scores)))


reg1_IPW <- feols(test_score ~ military, data = data,  se="hetero",  weights = data$ipw)
reg2_IPW <- feols(test_score ~ military+income+white+mothers_edu+mothers_age, data =data,  se="hetero",  weights = data$ipw)
etable(reg2_NE, reg1_IPW,reg2_IPW, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                              reg2_NE           reg1_IPW             reg2_IPW
## Dependent Var.:           test_score         test_score           test_score
##                                                                             
## (Intercept)       -1.084*** (0.2059) 0.1607*** (0.0177)   -1.071*** (0.1496)
## military            0.0814. (0.0440)   0.0928* (0.0370)     0.0832* (0.0356)
## white             0.2368*** (0.0563)                      0.2915*** (0.0447)
## income          3.67e-6*** (5.28e-7)                    3.62e-6*** (4.06e-7)
## mothers_edu      -0.3327*** (0.0565)                     -0.3339*** (0.0436)
## mothers_age       0.0208*** (0.0049)                      0.0196*** (0.0036)
## _______________ ____________________ __________________ ____________________
## S.E. type       Heteroskedasti.-rob. Heteroskedas.-rob. Heteroskedasti.-rob.
## Observations                   1,568              3,704                3,704
## R2                           0.07748            0.00253              0.08421
## Adj. R2                      0.07452            0.00226              0.08297
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

h)

Probably not. To claim causality, one needs to defend the conditional independence in this setting. However, it is hard to ignore important student characteristics not observed in this dataset, such as ability and motivation.

Disclaimer: this is a simulated dataset. The purpose of this question is to improve your understanding of selection on observables. We cannot draw any conclusions about the effect of Military Schools based on this example.↩︎

ECON 474: Econometrics of Policy Evaluation

Homework 2: Suggested Answers

Marcelino Guerra