Econ 414 Final Project - SPRING 2021
Getting the data from tidycensus
You are going to use the tidycensus
package to get data from neighborhoods within a metro area (your choice!). The goal of the project is to identify some short-run changes in these neighborhoods. You are interested in the following variables:
Variable | Definition | Code |
---|---|---|
total_pop | Total population | B03002_001 |
white | Total White (alone and non-hispanic) | B03002_003 |
black | Total Black/African American (alone and non-hispanic) | B03002_004 |
asian | Total Asian (alone and non-hispanic) | B03002_006 |
hisp | Total Hispanic or Latino | B03002_012 |
foreign | Total foreign born | B05002_013 |
educ_tot | Total population (>25) | B15003_001 |
bach | Total bachelor’s degree | B15003_022 |
master | Total master’s degree | B15003_023 |
prof | Total professional school degree | B15003_024 |
doc | Total doctorate degree | B15003_025 |
med_inc | Median housheold income | B19013_001 |
rent_inc_share | Median gross rent as a percentage of income | B25071_001 |
med_home_value | Median home value | B25077_001 |
pov_tot | Total population (poverty status) | B17001_001 |
below_pov | Total population with income below the poverty line | B17001_002 |
tenure_tot | Total housing units (tenure) | B25003_001 |
house_renter | Total housing units renter-occupied | B25003_003 |
occupancy_tot | Total housing units (occupancy status) | B25002_001 |
house_vacant | Total vacant housing units | B25002_003 |
build_age | Median year structure built | B25035_001 |
Using the following codes, I will request the data using get_acs()
. Since we are working with metro areas, we might need to get data from multiple states. In this example, I want to get data from Washington-Arlington-Alexandria, DC-VA-MD-WV MSA. Hence, I have states=c("DC","VA", "MD", "WV")
. The geography
is “tract” because we want to look at neighborhoods. In case you are working with Chicago-Naperville-Elgin, IL-IN-WI, you would need states=c("IL","IN", "WI")
.
Also, I am calling get_acs()
twice: one for 2008-2012 and another for 2015-2019. To map the data, you need geography=TRUE
in one of your calls (no need to get geometry twice).
library(tidycensus)
library(tidyverse)
library(tigris)
options(tigris_class = "sf")
options(tigris_use_cache = TRUE)
## You can always load_variables to check name and concept
#var2012_acs5<-load_variables(2012, dataset="acs5")
#var2019_acs5<-load_variables(2019, dataset="acs5")
<- get_acs(geography = "tract",
Variables12 variables = c(total_pop12="B03002_001",
white12="B03002_003",
black12="B03002_004",
asian12="B03002_006",
hisp12="B03002_012",
foreign12="B05002_013",
educ_tot12="B15003_001",
bach12="B15003_022",
master12="B15003_023",
prof12="B15003_024",
doc12="B15003_025",
med_inc12="B19013_001",
rent_inc_share12="B25071_001",
med_home_value12="B25077_001",
pov_tot12="B17001_001",
below_pov12="B17001_002",
tenure_tot12="B25003_001",
house_renter12="B25003_003",
occupancy_tot12="B25002_001",
house_vacant12="B25002_003",
build_age12="B25035_001"),
state = c("DC","VA", "MD", "WV"),
geometry = F,
output="wide",
year=2012)
<- get_acs(geography = "tract",
Variables19 variables = c(total_pop19="B03002_001",
white19="B03002_003",
black19="B03002_004",
asian19="B03002_006",
hisp19="B03002_012",
foreign19="B05002_013",
educ_tot19="B15003_001",
bach19="B15003_022",
master19="B15003_023",
prof19="B15003_024",
doc19="B15003_025",
med_inc19="B19013_001",
rent_inc_share19="B25071_001",
med_home_value19="B25077_001",
pov_tot19="B17001_001",
below_pov19="B17001_002",
tenure_tot19="B25003_001",
house_renter19="B25003_003",
occupancy_tot19="B25002_001",
house_vacant19="B25002_003",
build_age19="B25035_001"),
state = c("DC","VA", "MD", "WV"),
geometry = T,
output="wide",
year=2019)
After getting all the data, let’s do some cleaning. There is no need to keep the columns ending with “M” since they represent the margin of error, and we are looking for the estimates (variables ending with E). So we drop them and merge those two files into total_census
.
<-Variables12 %>% select(-ends_with('M'))
Variables12<-Variables19 %>% select(-ends_with('M'))
Variables19
<-left_join(Variables19, Variables12) total_census
Subsetting for metro area
We will be using additional packages in this task. tigris
helps with the MSA boundaries, sf
with the spatial join, and mapview
is a quick way to map (I prefer tmap
though). I also show you how to create a static map with ggplot()
.
## If you don't have them, first install.packages()
library(sf)
library(mapview)
The function core_based_statistical_areas()
gives you all the CBAs in the US . When I search for Washington-Arlington-Alexandria
, I found the GEOID 47900. Then, to get the specific boundaries for Washington-Arlington-Alexandria, DC-VA-MD-WV MSA, filter the metros
by the specific GEOID of the MSA. For instance, if you are working with Chicago-Naperville-Elgin, you would filter()
by `GEOID %in% c(“16980”).
Finally, st_join
subsets all the census tracts within the MSA boundary (WA_MSA
).
<- core_based_statistical_areas(cb = TRUE)
metros #View(metros) to search the specific GEOID of your MSA
<- metros%>%
WA_MSAfilter(GEOID %in% c("47900")) %>%
select(metro_name = NAME)
<- st_join(total_census, WA_MSA, join = st_within,left = FALSE) dc_data
Mapping information
Before mapping, let’s create the desired shares and organize the data frame for 2012 and 2019. First, the Median year structure built
for the ACS 5-year 2015-2019 gives zero to the buildings constructed before 1939, and we need to change it to compare that with data from 2012. Then, calculate the shares and bring prices from 2012 to 2019.
$build_age19E<-ifelse(dc_data$build_age19E==0, 1939, dc_data$build_age19E)
dc_data<-dc_data%>%
dc_datamutate(white_share12=white12E/total_pop12E,
white_share19=white19E/total_pop19E,
black_share12=black12E/total_pop12E,
black_share19=black19E/total_pop19E,
hisp_share12=hisp12E/total_pop12E,
hisp_share19=hisp19E/total_pop19E,
asian_share12=asian12E/total_pop12E,
asian_share19=asian19E/total_pop19E,
foreign_share12=foreign12E/total_pop12E,
foreign_share19=foreign19E/total_pop19E,
bachelor_share12=(bach12E+master12E+prof12E+doc12E)/educ_tot12E,
bachelor_share19=(bach19E+master19E+prof19E+doc19E)/educ_tot19E,
med_inc12E=med_inc12E*1.1, ## Bring 2012 dollars to 2019 dollars
med_home_value12E=med_home_value12E*1.1, ## Bring 2012 dollars to 2019 dollars
poverty_share12=below_pov12E/pov_tot12E,
poverty_share19=below_pov19E/pov_tot19E,
share_renter12=house_renter12E/tenure_tot12E,
share_renter19=house_renter19E/tenure_tot19E,
vacant_share12=house_vacant12E/occupancy_tot12E,
vacant_share19=house_vacant19E/occupancy_tot19E,
med_age12=2012-dc_data$build_age12E,
med_age19=2019-dc_data$build_age19E)
With the data ready, it is straightforward to map some variables:
library(patchwork)
<-dc_data%>%ggplot(aes(fill = poverty_share12)) +
map_pov12geom_sf(color = NA) +
coord_sf(crs = 26911) +
scale_fill_viridis_c(option = "magma")
<-dc_data%>%ggplot(aes(fill = poverty_share19)) +
map_pov19geom_sf(color = NA) +
coord_sf(crs = 26911) +
scale_fill_viridis_c(option = "magma")
+map_pov19 map_pov12
If you want a quick dynamic map:
library(leafpop)
mapviewOptions(basemaps ="OpenStreetMap.DE")
mapview(
dc_data, zcol = "med_inc19E",
legend = T,
popup =
popupTable(dc_data,zcol=c("med_inc19E","poverty_share19", "poverty_share12","med_home_value19E")))
Making tables
To compare neighborhoods’ characteristics over time, create tables summarizing important aspects about demographics, education, and housing. Here I will separate areas with poverty levels above 40% and summarize information for 2012. In the Final Project, you also need to do that for 2019.
Identifying the high-poverty tracts in 2012:
$HP12<-ifelse(dc_data$poverty_share12>=.4, "High-Poverty tracts", "Other tracts")
dc_data
<-dc_data%>%select(contains("12")|contains("12E"))
dc_data12<-dc_data12%>%drop_na() dc_data12
Summarizing neighborhood characteristics:
options(scipen = 999, digits=1)
library(kableExtra)
<-dc_data12%>%
HP12filter(HP12=="High-Poverty tracts")
<-HP12%>%summarise(`Median household income (U$ 2019)` = weighted.mean(med_inc12E, total_pop12E),
table_HP12`Population`=sum(total_pop12E),
`Population below poverty line`=sum(below_pov12E),
`% White (non-hispanic)`=weighted.mean(white_share12, total_pop12E)*100,
`% Black`=weighted.mean(black_share12, total_pop12E)*100,
`% Hispanic`=weighted.mean(hisp_share12, total_pop12E)*100,
`% Asian`=weighted.mean(asian_share12, total_pop12E)*100,
`% Foreign born`=weighted.mean(foreign_share12, total_pop12E)*100,
`% Adults with bachelor's degree or higher`=weighted.mean(bachelor_share12, educ_tot12E)*100,
`Rent as a share of income (%)`=weighted.mean(rent_inc_share12E, total_pop12E),
`Median home value (U$ 2019)`=weighted.mean(med_home_value12E, total_pop12E),
`% of renter-occupied`=weighted.mean(share_renter12, tenure_tot12E)*100,
`Vacancy rate`=weighted.mean(vacant_share12, occupancy_tot12E)*100,
`Median year structure built`=median(build_age12E),
`Observations`=n())
<-dc_data12%>%
other12filter(HP12=="Other tracts")
<-other12%>%summarise(`Median household income (U$ 2019)` = weighted.mean(med_inc12E, total_pop12E),
table_other12`Population`=sum(total_pop12E),
`Population below poverty line`=sum(below_pov12E),
`% White (non-hispanic)`=weighted.mean(white_share12, total_pop12E)*100,
`% Black`=weighted.mean(black_share12, total_pop12E)*100,
`% Hispanic`=weighted.mean(hisp_share12, total_pop12E)*100,
`% Asian`=weighted.mean(asian_share12, total_pop12E)*100,
`% Foreign born`=weighted.mean(foreign_share12, total_pop12E)*100,
`% Adults with bachelor's degree or higher`=weighted.mean(bachelor_share12, educ_tot12E)*100,
`Rent as a share of income (%)`=weighted.mean(rent_inc_share12E, total_pop12E),
`Median home value (U$ 2019)`=weighted.mean(med_home_value12E, total_pop12E),
`% of renter-occupied`=weighted.mean(share_renter12, tenure_tot12E)*100,
`Vacancy rate`=weighted.mean(vacant_share12, occupancy_tot12E)*100,
`Median year structure built`=median(build_age12E),
`Observations`=n())
Making the 2012 table:
<-t(table_HP12)
table_HP12<-t(table_other12)
table_other12
<-data.frame(cbind(table_HP12, table_other12))
table2012names(table2012)[1]<-"High-poverty tracts"
names(table2012)[2]<-"Other tracts"
%>%
table2012 kbl(caption = "Washington-Arlington-Alexandria neighborhood characteristics - 2012") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
High-poverty tracts | Other tracts | |
---|---|---|
Median household income (U$ 2019) | 26912 | 107550 |
Population | 76843 | 5379244 |
Population below poverty line | 27228 | 401984 |
% White (non-hispanic) | 24 | 49 |
% Black | 68 | 25 |
% Hispanic | 3 | 14 |
% Asian | 4 | 9 |
% Foreign born | 5 | 22 |
% Adults with bachelor’s degree or higher | 16 | 47 |
Rent as a share of income (%) | 40 | 30 |
Median home value (U$ 2019) | 328998 | 450051 |
% of renter-occupied | 82 | 35 |
Vacancy rate | 17 | 7 |
Median year structure built | 1961 | 1976 |
Observations | 21 | 1273 |
Is there any tipping-like behavior?
Since 1990, differences in residential location between minorities and whites occur due to white flight. Card, Mas and Rothstein (2008) provide evidence that white population flows exhibited tipping-like behavior in most U.S. cities, from 1970 to 2000. With tract-level data in 2012 and 2019, it is possible to run a visual inspection on changes in the white population and the minority share seven years ago.
By no means we are replicating Card, Mas and Rothstein (2008), but one can wonder if there was “white flight” in the past years plotting the mean percentage changes in the white population of Census tracts from 2012 to 2019 against the minority share (non-whites) in 2012. First, prepare the data to get the change and the share and group tracts into cells of width 1% by the 2012 minority. I do that by rounding the shares to the nearest integer.
<-dc_data%>%select(GEOID, NAME, white12E, white19E, total_pop12E, total_pop19E)
tipping<-st_drop_geometry(tipping)
tipping<-tipping%>%mutate(minority_share12=(total_pop12E-white12E)/total_pop12E*100)
tipping$group<-round(tipping$minority_share12)
tipping
<-tipping%>%select(white12E, white19E, total_pop12E, total_pop19E, group)
tipping_plot<-tipping_plot%>%drop_na()
tipping_plot<-tipping_plot%>%group_by(group)%>%summarise_all(.funs = c(sum="sum"))
tipping_plot<-tipping_plot%>%mutate(minority_share12=(total_pop12E_sum-white12E_sum)/total_pop12E_sum*100,
tipping_plotwhite_change=(white19E_sum-white12E_sum)/total_pop12E_sum*100)
It is simple to plot that information. There is no visible tipping point for Washington MSA: you actually see white inflows at many minority share levels.
options(digits=2)
library(ggthemes)
library(plotly)
<-ggplot(tipping_plot, aes(group, white_change))+
g1theme_fivethirtyeight() +
geom_point(size=5, alpha=.5)+
geom_hline(yintercept = 0, linetype = 'dashed')+
labs(x = "Percent non-white in tract in 2008-2012",
y="Change in white population - 20012-2019")+
scale_x_continuous(breaks = seq(0,100,5))+
theme(axis.text=element_text(size=18),
axis.title=element_text(size=18,face="bold"),
legend.text=element_text(size=15),
panel.grid.major = element_line( size=.2))+
theme(legend.title=element_blank())
ggplotly(g1, tooltip = c("x", "y"))