Abstract
Simple linear regression and model assumptions.An econ student decided that her grades are not good enough, but she did not know how to improve them. She talked to one of her professors, and the professor suggested that it really depends on how much one studies. So, the student collected data on 100 economics students and asked them how much they studied before an exam and what grade they got on that exam. The data are contained in this Excel file. Run a regression where Time is the dependent (Y) variable and Mark is the independent (X) variable. In the Residuals area on the regression window, check all four of the check boxes: Residuals, Standardized Residuals, Residual Plots, Line Fit Plots.
Based on the regression output, which of the following assumptions appear to be violated?
library(ggplot2)
library(xlsx)
study_grades<-read.xlsx("StudyGradesv3.xls", sheetName = "Sheet1", as.data.frame = T, header = T)
##### Time is the dependent, Mark is the explanatory
model2<-lm(Time~Mark, data=study_grades)
summary(model2)
##
## Call:
## lm(formula = Time ~ Mark, data = study_grades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.161 -2.142 0.369 3.314 11.665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.18921 2.46673 -0.887 0.377
## Mark 0.40153 0.03199 12.550 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.637 on 98 degrees of freedom
## Multiple R-squared: 0.6165, Adjusted R-squared: 0.6125
## F-statistic: 157.5 on 1 and 98 DF, p-value: < 2.2e-16
## Getting the residuals and predicted values
residuals<-resid(model2)
predicted<-model2$fitted.values
## Attaching those new series in your dataset
study_grades$residuals<-residuals
study_grades$predicted<-predicted
## Can you assume Homoskedasticity?
homosk2<-ggplot(study_grades, aes(x=predicted, y=residuals)) + geom_point()+labs(x="Predicted Values",y="Residuals") +ggtitle("Predicted Values Time x Residuals")
homosk2
## Can you assume Normality?
## Histogram
hist(residuals)
## Density
dens <- ggplot(study_grades, aes(x = residuals)) +
geom_density(kernel = "gaussian", position = "stack", size=1.5, fill = "#ff4d4d", alpha = 0.5) + labs(x = "Residuals", y="Density")+ theme(axis.text=element_text(size=18),axis.title=element_text(size=18,face="bold"), plot.title = element_text(size = 20, face = "bold"))
dens