6 Fixed Effects

6.1 Preliminaries

These notes are based on Nick Huntington-Klein’s lecture on Within Variation and Fixed Effects. Check his video lecture here. In this exercise, we use data from Cornwell and Trumbull (1994). This dataset download here comprehends 90 North Carolina counties during the period from 1981 to 1987. The table below describes the variables:

Table 6.1: Variables’ Description
Variable	Definition
county	County id
year	Time period (year)
crmrte	Crime rate (per person)
prbarr	Probability of arrest (arrests/offenses)
prbconv	Probability of conviction (conviction/arrests)
prbpris	Probability of prison (prison/convictions)
avgsen	Average sentence lenght
polpc	Police per capita
density	Population density
taxpc	Per capita tax revenue
west	Western counties dummy
central	Central counties dummy
urban	Urban counties dummy
pctmin80	Percentage of minority 1980
pctymle	Percentage of young male
w…	All other variables starting with `w` represent wages in different sectors/industries

When a column’s name starts with l, that means the logarithm transformation was applied to the variable.

6.2 Between variation

Reading the CT1993.RDS file and filtering by counties 1, 3, 23 and 91:

library(tidyverse)
setwd("C:/Users/User/Desktop/474-Rlab/datasets")
ct93<-readRDS("CT1993.RDS")
ct93_sub<-ct93%>%
  select(county, year, crmrte, prbarr, polpc, density, urban)%>%
  filter(county %in% c(1,3,23,91))

Exploring the between variation with a pooled regression, one can see the positive relationship between crime rate and probability of arrest:

library(fixest)
pooled<-feols(crmrte~prbarr, se="hetero", data=ct93_sub)
summary(pooled)

## OLS estimation, Dep. Var.: crmrte
## Observations: 28 
## Standard-errors: Heteroskedasticity-robust 
##             Estimate Std. Error  t value    Pr(>|t|)    
## (Intercept) 0.000776   0.004050 0.191625 8.49524e-01    
## prbarr      0.095990   0.012966 7.403400 7.33000e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.004859   Adj. R2: 0.640563

6.3 Within variation

There are three ways to get estimates using the within variation. Let’s explore each one in more detail:

6.3.1 Regress the demeaned values of \(Y_{it}\) on \(D_{it}\)

The first step is to get the group means (i.e., average crime rate and the average probability of arrest for each county during 1981-1987). Then, take the difference of each observation and its group mean:

within<-ct93_sub%>%
  group_by(county)%>%
  mutate(mean_crime=mean(crmrte),
         mean_prbarr=mean(prbarr))%>%
  mutate(within_crime=crmrte-mean_crime,
         within_prbarr=prbarr-mean_prbarr)

To get the estimates using the within variation, regress within_crime on within_prbarr:

within_demean<-lm(within_crime~within_prbarr,  data=within)
summary(within_demean)

## 
## Call:
## lm(formula = within_crime ~ within_prbarr, data = within)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0058380 -0.0015729 -0.0000249  0.0013558  0.0068621 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)    1.130e-18  5.242e-04    0.00    1.000
## within_prbarr -1.392e-02  1.809e-02   -0.77    0.448
## 
## Residual standard error: 0.002774 on 26 degrees of freedom
## Multiple R-squared:  0.02227,    Adjusted R-squared:  -0.01533 
## F-statistic: 0.5923 on 1 and 26 DF,  p-value: 0.4485

Differently from the between regression, the coefficient on the probability of arrest is negative (but not statistically significant).

6.3.2 Regress \(Y_{it}\) on \(D_{it}\) and county unit dummies

The demeaning process might be exhaustive because you need to do it not only for the dependent variable but also for all the covariates you are using. One easy way to get the same result is to allow different intercepts for each county:

lsdv<-lm(crmrte~-1+prbarr+factor(county), data=ct93_sub)
summary(lsdv)

## 
## Call:
## lm(formula = crmrte ~ -1 + prbarr + factor(county), data = ct93_sub)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0058380 -0.0015729 -0.0000249  0.0013558  0.0068621 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## prbarr           -0.013923   0.019235  -0.724    0.476    
## factor(county)1   0.040257   0.006338   6.352 1.76e-06 ***
## factor(county)3   0.017396   0.003576   4.864 6.54e-05 ***
## factor(county)23  0.032365   0.005546   5.835 6.03e-06 ***
## factor(county)91  0.036341   0.006594   5.511 1.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002949 on 23 degrees of freedom
## Multiple R-squared:  0.9914, Adjusted R-squared:  0.9896 
## F-statistic: 532.8 on 5 and 23 DF,  p-value: < 2.2e-16

This is what we call least squares dummy variables (LSDV) regression.

6.3.3 Regress \(Y_{it}\) on \(D_{it}\) with canned fixed effects routine

When you have many fixed effects (sometimes thousands, sometimes millions), LSDV regression can be time-consuming and computationally inefficient. -users have Fixest as an option to perform estimations with multiple fixed effects at a breakneck pace:

fe<-feols(crmrte~prbarr|county, data=ct93_sub)
summary(fe)

## OLS estimation, Dep. Var.: crmrte
## Observations: 28 
## Fixed-effects: county: 4
## Standard-errors: Clustered (county) 
##         Estimate Std. Error   t value Pr(>|t|) 
## prbarr -0.013923   0.026278 -0.529824 0.632941 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.002673     Adj. R2: 0.877019
##                  Within R2: 0.022272

The estimated coefficient is the same \(-0.013923\) in all the regressions above - what is different is the standard error from the demeaned and the last two regressions.

6.4 Two-way fixed effects

A common approach is to add a fixed effect for the panel unit and a time-varying intercept to capture common time trends across units, i.e., to control for something that varies over time in the same way for all panel units. To do that, add time to the panel unit on the right side of the vertical bar |.

twfe<-feols(crmrte~prbarr|county+year, data=ct93_sub)
summary(twfe)

## OLS estimation, Dep. Var.: crmrte
## Observations: 28 
## Fixed-effects: county: 4,  year: 7
## Standard-errors: Clustered (county) 
##        Estimate Std. Error  t value Pr(>|t|) 
## prbarr 0.005353   0.025778 0.207665 0.848789 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.001947     Adj. R2: 0.911743
##                  Within R2: 0.004485

6.5 Panel data alone can’t deal with simultaneity

A two-way fixed effects regression between crime rate and police per capita deals with common time trends and time-invariant unobserved heterogeneity. However, TWFE does not address simultaneity:

pol_crime<-feols(crmrte~polpc|county+year, data=ct93)
summary(pol_crime)

## OLS estimation, Dep. Var.: crmrte
## Observations: 630 
## Fixed-effects: county: 90,  year: 7
## Standard-errors: Clustered (county) 
##       Estimate Std. Error t value Pr(>|t|)    
## polpc   1.7938   0.339849  5.2783 9.12e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.005577     Adj. R2: 0.88804 
##                  Within R2: 0.226753

One can see that the coefficient on police per capita is positive and statistically significant. That positive association is most likely driven by counties with higher crime rates hiring more police officers per capita.

6.6 Intepreting Within Relationships

This regression replicates columns between and within in Table 3 in Cornwell and Trumbuil (1993). All the variables are in log, so the results are interpretable as elasticities.

full_between<-feols(lcrmrte~lprbarr+lprbconv+lprbpris+lavgsen+lpolpc+
                      ldensity+lpctymle+lwtuc+lwtrd+lwfir+lwser+lwmfg+
                      lwfed+lwsta+lwloc+central+urban+pctmin80,
                    se="standard", data=ct93)

full_within<-feols(lcrmrte~lprbarr+lprbconv+lprbpris+lavgsen+lpolpc+
                     ldensity+lpctymle+lwtuc+lwtrd+lwfir+lwser+lwmfg+
                     lwfed+lwsta+lwloc|county+year,
                   se="standard", data=ct93)

etable(full_between, full_within, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                        full_between         full_within
## Dependent Var.:             lcrmrte             lcrmrte
##                                                        
## (Intercept)      -1.711*** (0.5940)                    
## lprbarr         -0.6426*** (0.0312) -0.3556*** (0.0322)
## lprbconv        -0.4750*** (0.0223) -0.2815*** (0.0211)
## lprbpris        -0.1678*** (0.0502) -0.1710*** (0.0322)
## lavgsen            -0.0626 (0.0395)    -0.0034 (0.0261)
## lpolpc           0.3543*** (0.0230)  0.4136*** (0.0266)
## ldensity         0.3633*** (0.0289)     0.4168 (0.2825)
## lpctymle        -0.1962*** (0.0658)    0.6086* (0.3631)
## lwtuc               0.0148 (0.0300)   0.0464** (0.0190)
## lwtrd               0.0319 (0.0641)    -0.0195 (0.0405)
## lwfir              -0.0024 (0.0463)    -0.0040 (0.0283)
## lwser             -0.0538* (0.0320)     0.0083 (0.0191)
## lwmfg             -0.1029* (0.0543) -0.3601*** (0.1118)
## lwfed              -0.0337 (0.1176)   -0.3074* (0.1761)
## lwsta           -0.3136*** (0.0839)     0.0592 (0.1133)
## lwloc              0.1870* (0.1117)     0.1821 (0.1176)
## central           -0.0483* (0.0248)                    
## urban           -0.2132*** (0.0546)                    
## pctmin80         0.0147*** (0.0007)                    
## Fixed-Effects:  ------------------- -------------------
## county                           No                 Yes
## year                             No                 Yes
## _______________ ___________________ ___________________
## S.E. type                  Standard            Standard
## Observations                    630                 630
## R2                          0.79498             0.95314
## Within R2                        --             0.40013

Since we are using within variation, the interpretation must be within-county. Hence, raising the probability of arrest by 10% in a given county reduces the number of crimes per person in that county by 3.556%, on average.

Finally, we know that standard errors are hardly homoskedastic, and it is better to estimate the robust standard errors. However, when using fixed effects, we should cluster the standard errors at the main fixed effect level: in this way, we allow the error term to be correlated within units. Estimates are less precise but still significant at the 1% level.

full_between2<-feols(lcrmrte~lprbarr+lprbconv+lprbpris+lavgsen+lpolpc+
                      ldensity+lpctymle+lwtuc+lwtrd+lwfir+lwser+lwmfg+
                      lwfed+lwsta+lwloc+central+urban+pctmin80,
                    cluster=~county, data=ct93)

full_within2<-feols(lcrmrte~lprbarr+lprbconv+lprbpris+lavgsen+lpolpc+
                     ldensity+lpctymle+lwtuc+lwtrd+lwfir+lwser+lwmfg+
                     lwfed+lwsta+lwloc|county+year,
                   cluster=~county, data=ct93)

etable(full_between2, full_within2, signifCode = c("***"=0.01, "**"=0.05, "*"=0.10))

##                       full_between2        full_within2
## Dependent Var.:             lcrmrte             lcrmrte
##                                                        
## (Intercept)          -1.711 (1.130)                    
## lprbarr         -0.6426*** (0.0951) -0.3556*** (0.0600)
## lprbconv        -0.4750*** (0.0644) -0.2815*** (0.0494)
## lprbpris         -0.1678** (0.0678) -0.1710*** (0.0456)
## lavgsen            -0.0626 (0.0600)    -0.0034 (0.0322)
## lpolpc           0.3543*** (0.1079)  0.4136*** (0.0839)
## ldensity         0.3633*** (0.0620)     0.4168 (0.3690)
## lpctymle           -0.1962 (0.1196)     0.6086 (0.5625)
## lwtuc               0.0148 (0.0248)  0.0464*** (0.0167)
## lwtrd               0.0319 (0.1006)    -0.0195 (0.0313)
## lwfir              -0.0024 (0.0404)    -0.0040 (0.0124)
## lwser              -0.0538 (0.0392)     0.0083 (0.0220)
## lwmfg              -0.1029 (0.1220) -0.3601*** (0.1132)
## lwfed              -0.0337 (0.2434)    -0.3074 (0.2224)
## lwsta            -0.3136** (0.1385)     0.0592 (0.1133)
## lwloc               0.1870 (0.2341)     0.1821 (0.1624)
## central            -0.0483 (0.0454)                    
## urban            -0.2132** (0.0837)                    
## pctmin80         0.0147*** (0.0016)                    
## Fixed-Effects:  ------------------- -------------------
## county                           No                 Yes
## year                             No                 Yes
## _______________ ___________________ ___________________
## S.E.: Clustered          by: county          by: county
## Observations                    630                 630
## R2                          0.79498             0.95314
## Within R2                        --             0.40013

References

Cornwell, Christopher, and William N Trumbull. 1994. “Estimating the Economic Model of Crime with Panel Data.” The Review of Economics and Statistics, 360–66.