---
title: "3.4 — Multivariate OLS Estimators: Bias, Precision, and Fit — R Practice"
author: "YOUR NAME"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
#theme:
toc: true
toc_depth: 3
toc_float: true
code_folding: show
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Set Up
To minimize confusion, I suggest creating a new `R Project` (e.g. `regression_practice`) and storing any data in that folder on your computer.
## Question 1
Download and read in (`read_csv`) the data below.
- [ `speeding_tickets.csv`](https://metricsf20.classes.ryansafner.com/data/speeding-tickets.csv)
This data comes from a paper by Makowsky and Strattman (2009) that we will examine later. Even though state law sets a formula for tickets based on how fast a person was driving, police officers in practice often deviate from that formula. This dataset includes information on all traffic stops. An amount for the fine is given only for observations in which the police officer decided to assess a fine. There are a number of variables in this dataset, but the one's we'll look at are:
| Variable | Description |
|----------|-------------|
| `Amount` | Amount of fine (in dollars) assessed for speeding |
| `Age` | Age of speeding driver (in years) |
| `MPHover` | Miles per hour over the speed limit |
We want to explore who gets fines, and how much. We'll come back to the other variables (which are categorical) in this dataset in later lessons.
---
```{r}
# PUT CODE HERE
```
---
## Question 2
*How does the age of a driver affect the amount of the fine*? Make a scatterplot of the `Amount` of the fine and the driver's `Age`. Add a regression line with an additional layer of `geom_smooth(method="lm")`.
---
```{r}
# PUT CODE HERE
```
---
## Question 3
Find the correlation between `Amount` and `Age`.^[Note there are a lot of `NA`s (missing data) for `Amount`. You can verify by `summary()` or `count()` if you wish. In order to get a correlation, you will need to put `use="pairwise.complete.obs"` inside the `cor()` command.]
---
```{r}
# PUT CODE HERE
```
---
## Question 4
We weant to predict the following model:
$$\widehat{\text{Amount}_i}= \hat{\beta_0}+\hat{\beta_1}\text{Age}_i$$
Run a regression, and save it as an object. Now get a `summary()` of it.
---
```{r}
# PUT CODE HERE
```
---
### Part A
Write out the estimated regression equation.
---
```{r}
# PUT CODE HERE
```
---
### Part B
What is $\hat{\beta_0}$? What does it mean in the context of our question?
---
```{r}
# PUT CODE HERE
```
---
### Part C
What is $\hat{\beta_1}$? What does it mean in the context of our question?
---
```{r}
# PUT CODE HERE
```
---
### Part D
What is the marginal effect of age on amount?
---
```{r}
# PUT CODE HERE
```
---
## Question 5
Redo question 4 with the `broom` package. Try out `tidy()` and `glance()`. This is just to keep you versatile.
---
```{r}
# PUT CODE HERE
```
---
## Question 5
Redo question 4 with the `broom` package. Try out `tidy()` and `glance()`. This is just to keep you versatile.
---
```{r}
# PUT CODE HERE
```
---
## Question 6
How big would the difference in expected fine be for two drivers, one 18 years old and one 40 years old?
---
```{r}
# PUT CODE HERE
```
---
## Question 7
Now run the regression again, controlling for speed (`MPHover`).
```{r}
# PUT CODE HERE
```
---
### Part A
Write the new regression equation.
---
```{r}
# PUT CODE HERE
```
---
### Part B
What is the marginal effect of `Age` on `Amount`? What happened to it?
---
```{r}
# PUT CODE HERE
```
---
### Part C
What is the marginal effect of `MPHover` on `Amount`?
---
```{r}
# PUT CODE HERE
```
---
### Part D
What is $\hat{\beta_0}$, and what does it mean?
---
```{r}
# PUT CODE HERE
```
---
### Part E
What is the adjusted $\bar{R}^2$? What does it mean?
---
```{r}
# PUT CODE HERE
```
---
## Question 8
Now suppose both the 18 year old and the 40 year old each went 10 MPH over the speed limit. How big would the difference in expected fine be for the two drivers?
---
```{r}
# PUT CODE HERE
```
---
## Question 9
How about the difference in expected fine between two 18 year olds, one who went 10 MPH over, and one who went 30 MPH over?
---
```{r}
# PUT CODE HERE
```
---
## Question 10
Use the `huxtable` package's `huxreg()` command to make a regression table of your two regressions: the one from question 4, and the one from question 7.
---
```{r}
# PUT CODE HERE
```
---
## Question 11
Are our two independent variables multicollinear? Do younger people tend to drive faster?
### Part A
Get the correlation between `Age` and `MPHover`.
---
```{r}
# PUT CODE HERE
```
---
### Part B
Make a scatterplot of `MPHover` on `Age`.
---
```{r}
# PUT CODE HERE
```
---
### Part C
Run an auxiliary regression of `MPHover` on `Age`.
---
```{r}
# PUT CODE HERE
```
---
### Part D
Interpret the coefficient on `Age`.
---
```{r}
# PUT CODE HERE
```
---
### Part E
Look at your regression table in question 10. What happened to the standard error on `Age`? Why (consider the formula for variation in $\hat{\beta_1})$
---
```{r}
# PUT CODE HERE
```
---
### Part F
Calculate the Variance Inflation Factor (VIF) using the `car` package's `vif()` command.^[Run it on your multivariate regression object from Question 7.]
---
```{r}
# PUT CODE HERE
```
---
### Part G
Calculate the VIF manually, using what you learned in this question.
---
```{r}
# PUT CODE HERE
```
---
## Question 12
Let's now think about the omitted variable bias. Suppose the "true" model is the one we ran from Question 7.
### Part A
Do you suppose that `MPHover` fits the two criteria for omitted variable bias?
---
```{r}
# PUT CODE HERE
```
---
### Part B
Look at the regression we ran in Question 4. Consider this the "omitted" regression, where we left out `MPHover`. Does our estimate of the marginal effect of `Age` on `Amount` overstate or understate the *true* marginal effect?
---
```{r}
# PUT CODE HERE
```
---
### Part C
Use the "true" model (Question 7), the "omitted" regression (Question 4), and our "auxiliary" regression (Question 11) to identify each of the following parameters that describe our biased estimate of the marginal effect of `Age` on `Amount`:^[See the notation from class 3.2.]
$$\alpha_1=\beta_1+\beta_2\delta_1$$
---
```{r}
# PUT CODE HERE
```
---
### Part D
From your answer in part C, how large is the omitted variable bias from leaving out `MPHover`?
---
```{r}
# PUT CODE HERE
```
---
## Question 13
Make a coefficient plot of your coefficients from the regression in Question 7. The package `modelsummary` (which you will need to install and load) has a great command `modelplot()` to do this on your regression object.
```{r}
# PUT CODE HERE
```
---