2.7 — Inference for Regression - R Practice

Set Up

To minimize confusion, I suggest creating a new R Project (e.g. regression_practice) and storing any data in that folder on your computer.

Alternatively, I have made a project in R Studio Cloud that you can use (and not worry about trading room computer limitations), with the data already inside (you will still need to assign it to an object).

Question 1

Let’s use the diamonds data built into ggplot. Simply load tidyverse and then to be clear, save this as a tibble (feel free to rename it) with diamonds <- diamonds.

Question 2

Suppose we want to estimate the following relationship:

Run a regression of price on carat using lm() and get a summary.

Part A

What is ? Interpret it in the context of our regression.

Part B

Use broom’s tidy() command, and calculate a confidence interval by including conf.int = T inside tidy(). What is the 95% confidence interval for , and what does it mean? Save these endpoints as an object.

Question 3

Now let’s use infer. Install it if you don’t have it, then load it.

Part A

Let’s generate a confidence interval. First specify() the model relationship, then generate() reps = 1000 repetitions of the sample using a type = bootstrap, then have it calculate(stat = "slope").Note this will take a few minutes, its doing a lot of calculations!

What does it show you?

Part B

Continue the pipeline from part A, next have it get_confidence_interval(). Set level = 0.95, type = "se" and point_estimate equal to our estimated from Question 2.

Part C

Now instead of get_confidence_interval(), pipe into visualize() to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add +shade_ci(endpoints = ...) setting the argument equal to whatever you called your object containing the confidence interval.

Question 4

Now let’s test the following hypothesis:

Part A

What does the output of summary or of tidy from Question 2 tell you?

Part B

Let’s now do this with infer. First specify() the model relationship, then hypothesize(null = "independence") to declare , then generate() reps = 1000 repetitions of the sample using a type = permute, then have it calculate(stat = "slope"). What does it show you?

Part C

Continue the pipeline from part B, next have it get_p_value(). Inside this function, set obs_stat equal to our we found, and set direction = "both" to run a two-sided test, since our alternative hypothesis is two-sided, .

Part D

Instead of get_p_value(), pipe into visualize(obs_stat = ... , direction = "both"). where ... is our estimated .