Abstract
Multiple Linear Regression - \(R^{2}\), adjusted \(R^{2}\) and partial F-test.You have decided to predict gasoline prices in different cities and towns in the United States for your modeling project. Your dependent variable is price of gasoline per gallon, and your independent variables are per capita income (in dollars), number of firms manufacturing parts of automobiles in and around the city, number of new businesses started over the last year, population density of the city (in hundreds of persons per square mile), percentage of local taxes on gasoline (in %), number of power plants, and the number of people using public transportation per 100 people. You collected a sample of 27 cities and obtained an \(SSR = 172.5213\) along with \(S_{e}^{2}=8.9245\).
The adjusted \(R^{2}\) is _______
## 7 independent variables (Xs), sample size 27
n=27
k=7
SSR=172.5213
Se2=8.9245
SSE=Se2*(n-k-1)
SST=SSR+SSE
SST
## [1] 342.0868
AdjR2=1-(SSE/(n-k-1))/(SST/(n-1))
AdjR2
## [1] 0.3217014
This question aims to review the La Quinta Inn model building exercise as discussed in lecture and described in the course packet. The data collected can be obtained in this Excel file. The coefficient of multiple determination for the regression of profit margin (MARGIN) on all the other variables is ______
The adjusted coefficient of multiple determination is ______
From the regression output, at a 10% level of significance, which of the following variables are insignificant individually?
Perform a partial F-test to see whether these variables are really useless as a group in explaining the behavior of profit margin. The F-test statistic for testing the significance of the subset variables is _______
The p-value of the test statistic for testing the significance of the subset variables is _______
Now, using this final model, predict the profit margin in each of the following locations:
library(xlsx)
data<-read.xlsx("innsbruckv3.xls", sheetName = "Data", as.data.frame = T, header = T)
### Dependent variable: margin
reg1<-lm(MARGIN~ROOMS+NEAREST+OFFICE+INCOME+DISTTWN+COLLEGE, data=data)
summary(reg1)
##
## Call:
## lm(formula = MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME + DISTTWN +
## COLLEGE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.0566 -2.7083 -0.3153 4.0332 13.5122
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.084317 7.985766 9.277 6.93e-15 ***
## ROOMS -0.007975 0.001249 -6.387 6.61e-09 ***
## NEAREST -1.565521 0.632113 -2.477 0.01507 *
## OFFICE 0.019463 0.003417 5.695 1.43e-07 ***
## INCOME -0.423064 0.143796 -2.942 0.00411 **
## DISTTWN 0.229487 0.173906 1.320 0.19021
## COLLEGE 0.199082 0.137898 1.444 0.15219
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.518 on 93 degrees of freedom
## Multiple R-squared: 0.5256, Adjusted R-squared: 0.4949
## F-statistic: 17.17 on 6 and 93 DF, p-value: 2.895e-13
r2<-summary(reg1)$r.squared
r2
## [1] 0.5255565
adjr2<-summary(reg1)$adj.r.squared
adjr2
## [1] 0.4949472
### DISTTWN and COLLEGE are not statistically different from zero (pval>.10)
### For partial F test you need
# SSR and MSE from the full model
# SSR from the reduced model
# kd is the number of varibles that you just dropped. In this case, two: DISTTWN and COLLEGE
## Using R you can use the anova() function
### running the reduced model
reg2<-lm(MARGIN~ROOMS+NEAREST+OFFICE+INCOME, data=data)
reg2
##
## Call:
## lm(formula = MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME, data = data)
##
## Coefficients:
## (Intercept) ROOMS NEAREST OFFICE INCOME
## 79.611346 -0.008109 -1.530088 0.018736 -0.419833
### Partial F-test
anova(reg1, reg2)
## Analysis of Variance Table
##
## Model 1: MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME + DISTTWN + COLLEGE
## Model 2: MARGIN ~ ROOMS + NEAREST + OFFICE + INCOME
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 93 2831.4
## 2 95 2944.2 -2 -112.75 1.8517 0.1627
## your F stat is 1.8517 and the respective pvalue is 0.1627. So you do not reject the null
## What does that mean? USE THE REDUCED MODEL!!
### Predicted values
## remember that you are using the reduced model (reg2)
AnnArbor<-predict.lm(reg2,data.frame(ROOMS=2672, NEAREST=1.3, OFFICE=952, INCOME=35))
AnnArbor
## 1
## 59.09676
Bloom<-predict.lm(reg2,data.frame(ROOMS=2500, NEAREST=1.2, OFFICE=604, INCOME=37))
Bloom
## 1
## 53.28483
Champaign<-predict.lm(reg2,data.frame(ROOMS=2300, NEAREST=0.5, OFFICE=1430, INCOME=33.5))
Champaign
## 1
## 72.9229