Homework #20 Problem 2

You have decided to predict gasoline prices in different cities and towns in the United States for your modeling project. Your dependent variable is price of gasoline per gallon, and your independent variables are per capita income (in dollars), number of firms manufacturing parts of automobiles in and around the city, number of new businesses started over the last year, population density of the city (in hundreds of persons per square mile), percentage of local taxes on gasoline (in %), number of power plants, and the number of people using public transportation per 100 people. You collected a sample of 27 cities and obtained an \(SSR = 172.5213\) along with \(S_{e}^{2}=8.9245\).

The adjusted \(R^{2}\) is _______

## 7 independent variables (Xs), sample size 27
n=27
k=7
SSR=172.5213
Se2=8.9245

SSE=Se2*(n-k-1)
SST=SSR+SSE
SST
## [1] 342.0868
AdjR2=1-(SSE/(n-k-1))/(SST/(n-1))
AdjR2
## [1] 0.3217014

Homework #21 Problem 2

This question aims to review the La Quinta Inn model building exercise as discussed in lecture and described in the course packet. The data collected can be obtained in this Excel file. The coefficient of multiple determination for the regression of profit margin (MARGIN) on all the other variables is ______

The adjusted coefficient of multiple determination is ______

From the regression output, at a 10% level of significance, which of the following variables are insignificant individually?

Perform a partial F-test to see whether these variables are really useless as a group in explaining the behavior of profit margin. The F-test statistic for testing the significance of the subset variables is _______

The p-value of the test statistic for testing the significance of the subset variables is _______

Now, using this final model, predict the profit margin in each of the following locations:

Answer

library(xlsx)
data<-read.xlsx("innsbruckv3.xls", sheetName = "Data", as.data.frame = T, header = T)

### Dependent variable: margin

reg1<-lm(MARGIN~ROOMS+NEAREST+OFFICE+INCOME+DISTTWN+COLLEGE, data=data)
summary(reg1)
## 
## Call:
## lm(formula = MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME + DISTTWN + 
##     COLLEGE, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.0566  -2.7083  -0.3153   4.0332  13.5122 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 74.084317   7.985766   9.277 6.93e-15 ***
## ROOMS       -0.007975   0.001249  -6.387 6.61e-09 ***
## NEAREST     -1.565521   0.632113  -2.477  0.01507 *  
## OFFICE       0.019463   0.003417   5.695 1.43e-07 ***
## INCOME      -0.423064   0.143796  -2.942  0.00411 ** 
## DISTTWN      0.229487   0.173906   1.320  0.19021    
## COLLEGE      0.199082   0.137898   1.444  0.15219    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.518 on 93 degrees of freedom
## Multiple R-squared:  0.5256, Adjusted R-squared:  0.4949 
## F-statistic: 17.17 on 6 and 93 DF,  p-value: 2.895e-13
r2<-summary(reg1)$r.squared
r2
## [1] 0.5255565
adjr2<-summary(reg1)$adj.r.squared
adjr2
## [1] 0.4949472
### DISTTWN and COLLEGE are not statistically different from zero (pval>.10)

### For partial F test you need
# SSR and MSE from the full model
# SSR from the reduced model
# kd is the number of varibles that you just dropped. In this case, two: DISTTWN and COLLEGE

## Using R you can use the anova() function
### running the reduced model

reg2<-lm(MARGIN~ROOMS+NEAREST+OFFICE+INCOME, data=data)
reg2
## 
## Call:
## lm(formula = MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME, data = data)
## 
## Coefficients:
## (Intercept)        ROOMS      NEAREST       OFFICE       INCOME  
##   79.611346    -0.008109    -1.530088     0.018736    -0.419833
### Partial F-test
anova(reg1, reg2)
## Analysis of Variance Table
## 
## Model 1: MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME + DISTTWN + COLLEGE
## Model 2: MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     93 2831.4                           
## 2     95 2944.2 -2   -112.75 1.8517 0.1627
## your F stat is 1.8517 and the respective pvalue is 0.1627. So you do not reject the null
## What does that mean? USE THE REDUCED MODEL!!

### Predicted values
## remember that you are using the reduced model (reg2)
AnnArbor<-predict.lm(reg2,data.frame(ROOMS=2672, NEAREST=1.3, OFFICE=952, INCOME=35))
AnnArbor
##        1 
## 59.09676
Bloom<-predict.lm(reg2,data.frame(ROOMS=2500, NEAREST=1.2, OFFICE=604, INCOME=37))
Bloom
##        1 
## 53.28483
Champaign<-predict.lm(reg2,data.frame(ROOMS=2300, NEAREST=0.5, OFFICE=1430, INCOME=33.5))
Champaign
##       1 
## 72.9229