class: center, middle, inverse, title-slide # Econ 203 Lab Session ## Research Project ### Marcelino Guerra ### April 28, 2020 --- # Announcements <br/> <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M432 304c0 114.9-93.1 208-208 208S16 418.9 16 304c0-104 76.3-190.2 176-205.5V64h-28c-6.6 0-12-5.4-12-12V12c0-6.6 5.4-12 12-12h120c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-28v34.5c37.5 5.8 71.7 21.6 99.7 44.6l27.5-27.5c4.7-4.7 12.3-4.7 17 0l28.3 28.3c4.7 4.7 4.7 12.3 0 17l-29.4 29.4-.6.6C419.7 223.3 432 262.2 432 304zm-176 36V188.5c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12V340c0 6.6 5.4 12 12 12h40c6.6 0 12-5.4 12-12z"/></svg> Submit your project __before May `\(3^{rd}\)` at 11:45 pm__. <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M448 73.143v45.714C448 159.143 347.667 192 224 192S0 159.143 0 118.857V73.143C0 32.857 100.333 0 224 0s224 32.857 224 73.143zM448 176v102.857C448 319.143 347.667 352 224 352S0 319.143 0 278.857V176c48.125 33.143 136.208 48.572 224 48.572S399.874 209.143 448 176zm0 160v102.857C448 479.143 347.667 512 224 512S0 479.143 0 438.857V336c48.125 33.143 136.208 48.572 224 48.572S399.874 369.143 448 336z"/></svg> The original data for my project you find [here](https://github.com/guerramarcelino/Econ203/raw/master/data/dataPISA_orig.xlsx?raw=true). [Here](https://github.com/guerramarcelino/Econ203/raw/master/data/dataPISA.xlsx?raw=true) is the Excel file with all the results that I show on video. __Needless to say, do not use my data in your project__. <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 576 512"><path d="M336.2 64H47.8C21.4 64 0 85.4 0 111.8v288.4C0 426.6 21.4 448 47.8 448h288.4c26.4 0 47.8-21.4 47.8-47.8V111.8c0-26.4-21.4-47.8-47.8-47.8zm189.4 37.7L416 177.3v157.4l109.6 75.5c21.2 14.6 50.4-.3 50.4-25.8V127.5c0-25.4-29.1-40.4-50.4-25.8z"/></svg> Use the right mouse button to play, pause or save the video. <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 384 512"><path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"/></svg> Use Piazza! Student TAs are working 24/7 there. <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M400 32H48C21.49 32 0 53.49 0 80v352c0 26.51 21.49 48 48 48h352c26.51 0 48-21.49 48-48V80c0-26.51-21.49-48-48-48zM178.117 262.104C87.429 196.287 88.353 196.121 64 177.167V152c0-13.255 10.745-24 24-24h272c13.255 0 24 10.745 24 24v25.167c-24.371 18.969-23.434 19.124-114.117 84.938-10.5 7.655-31.392 26.12-45.883 25.894-14.503.218-35.367-18.227-45.883-25.895zM384 217.775V360c0 13.255-10.745 24-24 24H88c-13.255 0-24-10.745-24-24V217.775c13.958 10.794 33.329 25.236 95.303 70.214 14.162 10.341 37.975 32.145 64.694 32.01 26.887.134 51.037-22.041 64.72-32.025 61.958-44.965 81.325-59.406 95.283-70.199z"/></svg> Send me an email if you are in trouble (mguerra3@illinois.edu). <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M320 448v40c0 13.255-10.745 24-24 24H24c-13.255 0-24-10.745-24-24V120c0-13.255 10.745-24 24-24h72v296c0 30.879 25.121 56 56 56h168zm0-344V0H152c-13.255 0-24 10.745-24 24v368c0 13.255 10.745 24 24 24h272c13.255 0 24-10.745 24-24V128H344c-13.2 0-24-10.8-24-24zm120.971-31.029L375.029 7.029A24 24 0 0 0 358.059 0H352v96h96v-6.059a24 24 0 0 0-7.029-16.97z"/></svg> [Here](https://learn.illinois.edu/course/view.php?id=44712) you find help to write the research project. Try to emulate the analysis and structure of the sample reports available under "Project Resources". <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 581 512"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> If you are interested in programming language, all the Econ 203 Excel work is translated to R [here](https://github.com/guerramarcelino/Econ203), including [my project](https://guerramarcelino.github.io/project/). --- class: inverse, center, middle, clear # Step 1 --- # Scatter Plots .font90[You need to create scatter plots for each pair `\((Y,X_{1})\)`, `\((Y, X_{2})\)`, etc. of your data. In this very first step, you can spot outliers - points very far from the line, such as Qatar and Vietnam, in my case. Then, decide if you will get rid of them. ] <iframe src="media/PISA_Efficiency.html" style="width: 800px; height: 450px; border: 5px" alt=""> --- # Scatter Plots <br> <video width=890px height=460px> <source src="media/scatter.mp4" type="video/mp4"> </video> --- # Descriptive Statistics <br> <video width=890px height=460px> <source src="media/stat.mp4" type="video/mp4"> </video> --- # Creating dummy variables .font90[I need to transform continents (names) into numbers. Since I have five categories (Africa, Asia, Europe, South America, and North America), I am creating four dummy variables. I took out Europe, the most common case in my sample. Therefore, Europe is my baseline. ] <video width=890px height=460px> <source src="media/dummy.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 2 --- # Run the Full Model <br> .font120[ __Run the Full Model__ + Check significance: individual (t-test) and global (F-test)<br> + What is the `\(R^{2}\)`? Can you explain what does that value mean?<br> ] --- # Run the Full Model <br> <video width=890px height=460px> <source src="media/full.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 3 --- # Full Model Assumptions <br> __Check the Full Model assumptions__ + Normality of Errors <br> Create a histogram using __standardized residuals__ and bin values. + Homoskedasticity x Heteroskedasticity <br> Create a scatter plot of __residuals__ vs predicted values. In case you find that the model violates heteroskedasticity, transform the dependent variable using `\(ln\)`. + Independence of Errors x Autocorrelation <br> If you are working with time series, run the Durbin-Watson test. In case you have autocorrelation, add a trend variable in your model. + Outliers <br> You already did it in step #1. + Multicolinearity<br> Construct the correlation matrix. Check if there is high correlation between your independend variables. --- # Normality <br> <video width=890px height=460px> <source src="media/normal.mp4" type="video/mp4"> </video> --- # Homoskedasticity <br> <video width=890px height=460px> <source src="media/homosk.mp4" type="video/mp4"> </video> --- # Multicolinearity <br> <video width=890px height=460px> <source src="media/multicol.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 4 --- # Run the Reduced Model <br> .font120[ __Run the Reduced Model__ + Drop the variables that are not statistically significant in your full model (check for `\(\hat{\beta}s\)` with p-value>.05 or p-value>.1, you decide!)<br> + Check significance: individual (t-test) and global (F-test)<br> + What is the `\(R^{2}\)`? Can you explain what does that value mean?<br> ] --- # Run the Reduced Model <br> <video width=890px height=460px> <source src="media/reduced.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Step 5 --- # Reduced Model Assumptions <br> .font120[ __Check the Reduced Model assumptions__ + Normality of Errors <br> + Homoskedasticity x Heteroskedasticity <br> + Independence of Errors x Autocorrelation (if you are working with time series)<br> <br> Yes, you have to do it again. ] --- class: inverse, center, middle, clear # Step 6 --- # Partial F-test <br> .font120[ Reject `\(H_{0}\)` if: `$$\dfrac{\frac{SSR_{f} - SSR_{r}}{k_{d}}}{MSE_{f}}>F_{\alpha, k_{d}, (n-k-1)_{f}}$$` where `\(SSR_{f}\)` is the Sum of Squares Regression from the full model, `\(SSR_{r}\)` is the Sum of Squares Regression from the reduced model, `\(k_{d}\)` is the number of variables that you eliminated and `\(MSE_{f}\)` is the Mean Square for Error from the full model. __If you are rejecting `\(H_{0}\)`, use the Full Model for the section "Empirical Results". Otherwise, use the Reduced Model.__ ] --- # Partial F-test <br> <video width=890px height=460px> <source src="media/partial.mp4" type="video/mp4"> </video> --- class: inverse, center, middle, clear # Writing a report --- # Writing a report 1. __Introduction__<br> Start with the motivation: why your topic is important/interesting? State your research question and briefly discuss your results 2. __Data__<br> Describe the sources of your data. If you transformed your variables, explain what you did. Also, comment the descriptive statistics and the correlation matrix. 3. __Regression Analysis__<br> First and foremost: we want to see equations! Also, state every step that you did, from outliers to the partial F-test. What is your final model? Is it robust? Does it satisfy the model assumptions? 4. __Empirical Results__<br> Time to interpret your results: `\(\hat{\beta}\)`'s and `\(R^{2}\)`. Does the result match your prior beliefs? Explain statistically significant results and mention those coefficients not statistically different from zero. Which one is your final model? Full or reduced? 5. __Summary and Discussion__<br> Restate your research question and highlight your main findings. Suggest improvements for future studies, etc.