2.7 — Inference for Regression - R Practice
Set Up
To minimize confusion, I suggest creating a new R Project
(e.g. regression_practice
) and storing any data in that folder on your computer.
Alternatively, I have made a project in R Studio Cloud that you can use (and not worry about trading room computer limitations), with the data already inside (you will still need to assign it to an object).
Question 1
Let’s use the diamonds
data built into ggplot
. Simply load tidyverse
and then to be clear, save this as a tibble (feel free to rename it) with diamonds <- diamonds
.
Question 2
Suppose we want to estimate the following relationship:
\[\text{price}_i = \beta_0 + \beta_1 \text{carat}_i + u_i\]
Run a regression of price
on carat
using lm()
and get a summary
.
Part A
What is \(\hat{\beta_1}\)? Interpret it in the context of our regression.
Part B
Use broom
’s tidy()
command, and calculate a confidence interval by including conf.int = T
inside tidy()
. What is the 95% confidence interval for \(\hat{\beta_1}\), and what does it mean? Save these endpoints as an object.
Question 3
Now let’s use infer
. Install it if you don’t have it, then load it.
Part A
Let’s generate a confidence interval. First specify()
the model relationship, then generate()
reps = 1000
repetitions of the sample using a type = bootstrap
, then have it calculate(stat = "slope")
.Note this will take a few minutes, its doing a lot of calculations!
What does it show you?
Part B
Continue the pipeline from part A, next have it get_confidence_interval()
. Set level = 0.95, type = "se"
and point_estimate
equal to our estimated \(\hat{\beta_1}\) from Question 2.
Part C
Now instead of get_confidence_interval()
, pipe into visualize()
to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add +shade_ci(endpoints = ...)
setting the argument equal to whatever you called your object containing the confidence interval.
Question 4
Now let’s test the following hypothesis:
\[\begin{align*} H_0: \beta_1 &= 0\\ H_1: \beta_1 &\neq 0\\ \end{align*}\]
Part A
What does the output of summary
or of tidy
from Question 2 tell you?
Part B
Let’s now do this with infer
. First specify()
the model relationship, then hypothesize(null = "independence")
to declare \(H_0: \beta_1 = 0\), then generate()
reps = 1000
repetitions of the sample using a type = permute
, then have it calculate(stat = "slope")
. What does it show you?
Part C
Continue the pipeline from part B, next have it get_p_value()
. Inside this function, set obs_stat
equal to our \(\hat{\beta_1}\) we found, and set direction = "both"
to run a two-sided test, since our alternative hypothesis is two-sided, \(H_1: \beta_1 \neq 0\).
Part D
Instead of get_p_value()
, pipe into visualize(obs_stat = ... , direction = "both").
where ...
is our estimated \(\hat{\beta_1}\).