2.7 — Inference for Regression - R Practice
Set Up
To minimize confusion, I suggest creating a new R Project (e.g. regression_practice) and storing any data in that folder on your computer.
Alternatively, I have made a project in R Studio Cloud that you can use (and not worry about trading room computer limitations), with the data already inside (you will still need to assign it to an object).
Question 1
Let’s use the diamonds data built into ggplot. Simply load tidyverse and then to be clear, save this as a tibble (feel free to rename it) with diamonds <- diamonds.
Question 2
Suppose we want to estimate the following relationship:
pricei=β0+β1carati+ui
Run a regression of price on carat using lm() and get a summary.
Part A
What is ^β1? Interpret it in the context of our regression.
Part B
Use broom’s tidy() command, and calculate a confidence interval by including conf.int = T inside tidy(). What is the 95% confidence interval for ^β1, and what does it mean? Save these endpoints as an object.
Question 3
Now let’s use infer. Install it if you don’t have it, then load it.
Part A
Let’s generate a confidence interval. First specify() the model relationship, then generate() reps = 1000 repetitions of the sample using a type = bootstrap, then have it calculate(stat = "slope").Note this will take a few minutes, its doing a lot of calculations!
What does it show you?
Part B
Continue the pipeline from part A, next have it get_confidence_interval(). Set level = 0.95, type = "se" and point_estimate equal to our estimated ^β1 from Question 2.
Part C
Now instead of get_confidence_interval(), pipe into visualize() to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add +shade_ci(endpoints = ...) setting the argument equal to whatever you called your object containing the confidence interval.
Question 4
Now let’s test the following hypothesis:
H0:β1=0H1:β1≠0
Part A
What does the output of summary or of tidy from Question 2 tell you?
Part B
Let’s now do this with infer. First specify() the model relationship, then hypothesize(null = "independence") to declare H0:β1=0, then generate() reps = 1000 repetitions of the sample using a type = permute, then have it calculate(stat = "slope"). What does it show you?
Part C
Continue the pipeline from part B, next have it get_p_value(). Inside this function, set obs_stat equal to our ^β1 we found, and set direction = "both" to run a two-sided test, since our alternative hypothesis is two-sided, H1:β1≠0.
Part D
Instead of get_p_value(), pipe into visualize(obs_stat = ... , direction = "both"). where ... is our estimated ^β1.