Homework #23

One of the problems that appraisers of real estate face is the lack of sufficient data. However, over the last few years they can access enormous data banks from the Federal National Mortgage Association (Fannie Mae) and the Federal Home Loan Mortgage Corporation (Freddie Mac) that have substantially reduced the problem of data availability. The data on millions of transactions in which these secondary mortgage market organizations are involved are now stored and available online to lenders and appraisers. The models and procedures that will govern the application of these data sources are now largely developed and will undoubtedly lead to major changes both in the scope of the residential appraisal function and in the way that residential appraisals are conducted. Multiple regression analysis is very useful in estimating models for appraising rents for one- to four-bedroom family residences as well as other properties where substantial data on a large number of transactions are available. The independent variables that theory suggest include categorical variables like whether the apartment has a dishwasher (DSHWSHER=1), a tennis court in the complex (TENISCRT=1), a sauna in the complex (SAUNA=1), whether the apartment is for yearly lease only (YRLYLEAS=1), are there some extra utilities like Ethernet connections and laundry services (XTRAUTIL=1), whether the apartment is in a poor neighborhood (POORNBHD=1), whether utilities are included (UTLITIES=1), whether the apartment is furnished (FURNSHED=1), and the type of apartment (efficiency, one-, two, three-, four-bedroom). In the data, the most commonly occurring type of apartment was one-bedroom. Besides those characteristics, the apartment rate also depends on the number of baths per bedroom (BTHPERBD), average square feet per bedroom (BDRMSIZE), and a measure of location in number of miles from city center (LOCATION). You decide to use multiple regression analysis in estimating models for appraising rents for apartments (efficiencies and one- to four-bedroom apartments). The data on rent and the aforementioned characteristics for 300 of these apartments are stored in this Excel spreadsheet.

Answer

library(xlsx)
library(fastDummies)
data<-read.xlsx("rent_proj2_dat7.xls", sheetName ="Data", header=T, as.data.frame = T)
lapply(data, class)
## $RENT
## [1] "numeric"
## 
## $DSHWSHER
## [1] "factor"
## 
## $TENISCRT
## [1] "factor"
## 
## $SAUNA
## [1] "factor"
## 
## $YRLYLEAS
## [1] "factor"
## 
## $XTRAUTIL
## [1] "factor"
## 
## $POORNBHD
## [1] "factor"
## 
## $UTLITIES
## [1] "factor"
## 
## $FURNSHED
## [1] "factor"
## 
## $BTHPERBD
## [1] "numeric"
## 
## $BDRMSIZE
## [1] "numeric"
## 
## $LOCATION
## [1] "numeric"
## 
## $EFFICNCY
## [1] "factor"
## 
## $TWOBDRM
## [1] "factor"
## 
## $THREBDRM
## [1] "factor"
## 
## $FOURBDRM
## [1] "factor"
## Since independent variables with Yes/No are factors, there is no need to 
## create dummy variables for each case. Just run the regression
## with . and you will use all the Xs in your dataframe

## Full Model
## 
full<-lm(RENT~., data=data)
summary(full)
## 
## Call:
## lm(formula = RENT ~ ., data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -203.202  -51.120   -3.478   48.719  198.613 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 266.6188    23.2248  11.480  < 2e-16 ***
## DSHWSHERYes  19.3591    11.3219   1.710  0.08838 .  
## TENISCRTYes   1.1612    13.1639   0.088  0.92977    
## SAUNAYes      7.5378    11.4829   0.656  0.51207    
## YRLYLEASYes  24.4220    46.6911   0.523  0.60134    
## XTRAUTILYes  20.9381    15.5396   1.347  0.17892    
## POORNBHDYes -39.9552     9.4414  -4.232 3.13e-05 ***
## UTLITIESYes  24.7427    10.4972   2.357  0.01910 *  
## FURNSHEDYes  42.0003     9.3352   4.499 9.96e-06 ***
## BTHPERBD      8.2657     6.6296   1.247  0.21350    
## BDRMSIZE      0.4278     0.0554   7.722 1.98e-13 ***
## LOCATION    -12.4218     1.0479 -11.854  < 2e-16 ***
## EFFICNCYYes -43.0677    13.3683  -3.222  0.00142 ** 
## TWOBDRMYes   59.5707    11.6458   5.115 5.78e-07 ***
## THREBDRMYes  99.9038    15.7285   6.352 8.43e-10 ***
## FOURBDRMYes 148.4485    18.5817   7.989 3.43e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.42 on 284 degrees of freedom
## Multiple R-squared:  0.5965, Adjusted R-squared:  0.5752 
## F-statistic: 27.99 on 15 and 284 DF,  p-value: < 2.2e-16
## Drop all the not statistically significant variables and run the reduced model
reduc<-lm(RENT~.-TENISCRT-SAUNA-YRLYLEAS-XTRAUTIL-BTHPERBD-DSHWSHER, data=data)
summary(reduc)
## 
## Call:
## lm(formula = RENT ~ . - TENISCRT - SAUNA - YRLYLEAS - XTRAUTIL - 
##     BTHPERBD - DSHWSHER, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -197.685  -54.088   -4.217   51.406  214.275 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 292.74182   20.43674  14.324  < 2e-16 ***
## POORNBHDYes -41.11861    9.36937  -4.389 1.60e-05 ***
## UTLITIESYes  27.91858   10.43578   2.675  0.00789 ** 
## FURNSHEDYes  39.59889    9.28595   4.264 2.72e-05 ***
## BDRMSIZE      0.42977    0.05474   7.852 8.02e-14 ***
## LOCATION    -12.79089    1.04282 -12.266  < 2e-16 ***
## EFFICNCYYes -43.12174   13.27260  -3.249  0.00129 ** 
## TWOBDRMYes   57.89485   11.57784   5.000 9.92e-07 ***
## THREBDRMYes 101.84946   15.71738   6.480 3.93e-10 ***
## FOURBDRMYes 145.35713   18.48332   7.864 7.37e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.85 on 290 degrees of freedom
## Multiple R-squared:  0.5834, Adjusted R-squared:  0.5705 
## F-statistic: 45.13 on 9 and 290 DF,  p-value: < 2.2e-16
## Partial F-test
anova(full, reduc)
## Analysis of Variance Table
## 
## Model 1: RENT ~ DSHWSHER + TENISCRT + SAUNA + YRLYLEAS + XTRAUTIL + POORNBHD + 
##     UTLITIES + FURNSHED + BTHPERBD + BDRMSIZE + LOCATION + EFFICNCY + 
##     TWOBDRM + THREBDRM + FOURBDRM
## Model 2: RENT ~ (DSHWSHER + TENISCRT + SAUNA + YRLYLEAS + XTRAUTIL + POORNBHD + 
##     UTLITIES + FURNSHED + BTHPERBD + BDRMSIZE + LOCATION + EFFICNCY + 
##     TWOBDRM + THREBDRM + FOURBDRM) - TENISCRT - SAUNA - YRLYLEAS - 
##     XTRAUTIL - BTHPERBD - DSHWSHER
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1    284 1746359                           
## 2    290 1802929 -6    -56570 1.5333 0.1671
############### P-value>.05 and you do not reject the null. Therefore,
############### for prediction purposes, use the reduced model.

## Interaction dummy for DSHWSHER, SAUNA and TENISCRT
## Since those variables are factors, I will create dummy variables (numeric) for them.
## After that, I construct the interaction dummy  
dummies<-dummy_cols(data[,c(2,3,4)], select_columns = c("DSHWSHER","SAUNA","TENISCRT"))

dummies$interact<-dummies$DSHWSHER_Yes*dummies$SAUNA_Yes*dummies$TENISCRT_Yes
################################
## Bringing the new column to your data set
data$interact<-dummies$interact

full_2<-lm(RENT~., data=data)
summary(full_2)
## 
## Call:
## lm(formula = RENT ~ ., data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -204.198  -50.335   -3.744   48.523  197.931 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 266.1601    23.4386  11.356  < 2e-16 ***
## DSHWSHERYes  21.8702    19.2854   1.134  0.25774    
## TENISCRTYes   0.2771    14.2845   0.019  0.98454    
## SAUNAYes      9.0061    14.6797   0.614  0.54003    
## YRLYLEASYes  23.9018    46.8829   0.510  0.61058    
## XTRAUTILYes  21.0792    15.5909   1.352  0.17745    
## POORNBHDYes -40.0073     9.4632  -4.228 3.19e-05 ***
## UTLITIESYes  24.9458    10.5908   2.355  0.01918 *  
## FURNSHEDYes  42.0172     9.3518   4.493 1.02e-05 ***
## BTHPERBD      8.1916     6.6569   1.231  0.21952    
## BDRMSIZE      0.4277     0.0555   7.706 2.21e-13 ***
## LOCATION    -12.4212     1.0497 -11.833  < 2e-16 ***
## EFFICNCYYes -42.8418    13.4646  -3.182  0.00163 ** 
## TWOBDRMYes   59.7214    11.7034   5.103 6.14e-07 ***
## THREBDRMYes 100.1873    15.8537   6.319 1.02e-09 ***
## FOURBDRMYes 148.4632    18.6139   7.976 3.77e-14 ***
## interact     -3.5809    22.2435  -0.161  0.87222    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 78.55 on 283 degrees of freedom
## Multiple R-squared:  0.5965, Adjusted R-squared:  0.5737 
## F-statistic: 26.15 on 16 and 283 DF,  p-value: < 2.2e-16