class: center, middle, inverse, title-slide # Econ 203 Lab Session ## Research Project ### Marcelino Guerra ### April 28, 2020 --- # Announcements <br/>
Submit your project __before May `\(3^{rd}\)` at 11:45 pm__.
The original data for my project you find [here](https://github.com/guerramarcelino/Econ203/raw/master/data/dataPISA_orig.xlsx?raw=true). [Here](https://github.com/guerramarcelino/Econ203/raw/master/data/dataPISA.xlsx?raw=true) is the Excel file with all the results that I show on video. __Needless to say, do not use my data in your project__.
Use the right mouse button to play, pause or save the video.
Use Piazza! Student TAs are working 24/7 there.
Send me an email if you are in trouble (mguerra3@illinois.edu).
[Here](https://learn.illinois.edu/course/view.php?id=44712) you find help to write the research project. Try to emulate the analysis and structure of the sample reports available under "Project Resources".
If you are interested in programming language, all the Econ 203 Excel work is translated to R [here](https://github.com/guerramarcelino/Econ203), including [my project](https://guerramarcelino.github.io/project/). --- class: inverse, center, middle, clear # Step 1 --- # Scatter Plots .font90[You need to create scatter plots for each pair `\((Y,X_{1})\)`, `\((Y, X_{2})\)`, etc. of your data. In this very first step, you can spot outliers - points very far from the line, such as Qatar and Vietnam, in my case. Then, decide if you will get rid of them. ] <iframe src="media/PISA_Efficiency.html" style="width: 800px; height: 450px; border: 5px" alt=""> --- # Scatter Plots <br> <video width=890px height=460px> <source src="media/scatter.mp4" type="video/mp4"> </video> --- # Descriptive Statistics <br> <video width=890px height=460px> <source src="media/stat.mp4" type="video/mp4"> </video> --- # Creating dummy variables .font90[I need to transform continents (names) into numbers. Since I have five categories (Africa, Asia, Europe, South America, and North America), I am creating four dummy variables. I took out Europe, the most common case in my sample. Therefore, Europe is my baseline. ] <video width=890px height=460px> <source src="media/dummy.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 2 --- # Run the Full Model <br> .font120[ __Run the Full Model__ + Check significance: individual (t-test) and global (F-test)<br> + What is the `\(R^{2}\)`? Can you explain what does that value mean?<br> ] --- # Run the Full Model <br> <video width=890px height=460px> <source src="media/full.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 3 --- # Full Model Assumptions <br> __Check the Full Model assumptions__ + Normality of Errors <br> Create a histogram using __standardized residuals__ and bin values. + Homoskedasticity x Heteroskedasticity <br> Create a scatter plot of __residuals__ vs predicted values. In case you find that the model violates heteroskedasticity, transform the dependent variable using `\(ln\)`. + Independence of Errors x Autocorrelation <br> If you are working with time series, run the Durbin-Watson test. In case you have autocorrelation, add a trend variable in your model. + Outliers <br> You already did it in step #1. + Multicolinearity<br> Construct the correlation matrix. Check if there is high correlation between your independend variables. --- # Normality <br> <video width=890px height=460px> <source src="media/normal.mp4" type="video/mp4"> </video> --- # Homoskedasticity <br> <video width=890px height=460px> <source src="media/homosk.mp4" type="video/mp4"> </video> --- # Multicolinearity <br> <video width=890px height=460px> <source src="media/multicol.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 4 --- # Run the Reduced Model <br> .font120[ __Run the Reduced Model__ + Drop the variables that are not statistically significant in your full model (check for `\(\hat{\beta}s\)` with p-value>.05 or p-value>.1, you decide!)<br> + Check significance: individual (t-test) and global (F-test)<br> + What is the `\(R^{2}\)`? Can you explain what does that value mean?<br> ] --- # Run the Reduced Model <br> <video width=890px height=460px> <source src="media/reduced.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 5 --- # Reduced Model Assumptions <br> .font120[ __Check the Reduced Model assumptions__ + Normality of Errors <br> + Homoskedasticity x Heteroskedasticity <br> + Independence of Errors x Autocorrelation (if you are working with time series)<br> <br> Yes, you have to do it again. ] --- class: inverse, center, middle, clear # Step 6 --- # Partial F-test <br> .font120[ Reject `\(H_{0}\)` if: `$$\dfrac{\frac{SSR_{f} - SSR_{r}}{k_{d}}}{MSE_{f}}>F_{\alpha, k_{d}, (n-k-1)_{f}}$$` where `\(SSR_{f}\)` is the Sum of Squares Regression from the full model, `\(SSR_{r}\)` is the Sum of Squares Regression from the reduced model, `\(k_{d}\)` is the number of variables that you eliminated and `\(MSE_{f}\)` is the Mean Square for Error from the full model. __If you are rejecting `\(H_{0}\)`, use the Full Model for the section "Empirical Results". Otherwise, use the Reduced Model.__ ] --- # Partial F-test <br> <video width=890px height=460px> <source src="media/partial.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Writing a report --- # Writing a report 1. __Introduction__<br> Start with the motivation: why your topic is important/interesting? State your research question and briefly discuss your results 2. __Data__<br> Describe the sources of your data. If you transformed your variables, explain what you did. Also, comment the descriptive statistics and the correlation matrix. 3. __Regression Analysis__<br> First and foremost: we want to see equations! Also, state every step that you did, from outliers to the partial F-test. What is your final model? Is it robust? Does it satisfy the model assumptions? 4. __Empirical Results__<br> Time to interpret your results: `\(\hat{\beta}\)`'s and `\(R^{2}\)`. Does the result match your prior beliefs? Explain statistically significant results and mention those coefficients not statistically different from zero. Which one is your final model? Full or reduced? 5. __Summary and Discussion__<br> Restate your research question and highlight your main findings. Suggest improvements for future studies, etc.