# 2.7 — Inference for Regression - R Practice

## Set Up

To minimize confusion, I suggest creating a new R Project (e.g. regression_practice) and storing any data in that folder on your computer.

Alternatively, I have made a project in R Studio Cloud that you can use (and not worry about trading room computer limitations), with the data already inside (you will still need to assign it to an object).

### Question 1

Let’s use the diamonds data built into ggplot. Simply load tidyverse and then to be clear, save this as a tibble (feel free to rename it) with diamonds <- diamonds.

### Question 2

Suppose we want to estimate the following relationship:

$\text{price}_i = \beta_0 + \beta_1 \text{carat}_i + u_i$

Run a regression of price on carat using lm() and get a summary.

#### Part A

What is $$\hat{\beta_1}$$? Interpret it in the context of our regression.

#### Part B

Use broom’s tidy() command, and calculate a confidence interval by including conf.int = T inside tidy(). What is the 95% confidence interval for $$\hat{\beta_1}$$, and what does it mean? Save these endpoints as an object.

### Question 3

Now let’s use infer. Install it if you don’t have it, then load it.

### Part A

Let’s generate a confidence interval. First specify() the model relationship, then generate() reps = 1000 repetitions of the sample using a type = bootstrap, then have it calculate(stat = "slope").Note this will take a few minutes, its doing a lot of calculations!

What does it show you?

### Part B

Continue the pipeline from part A, next have it get_confidence_interval(). Set level = 0.95, type = "se" and point_estimate equal to our estimated $$\hat{\beta_1}$$ from Question 2.

### Part C

Now instead of get_confidence_interval(), pipe into visualize() to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add +shade_ci(endpoints = ...) setting the argument equal to whatever you called your object containing the confidence interval.

### Question 4

Now let’s test the following hypothesis:

\begin{align*} H_0: \beta_1 &= 0\\ H_1: \beta_1 &\neq 0\\ \end{align*}

#### Part A

What does the output of summary or of tidy from Question 2 tell you?

#### Part B

Let’s now do this with infer. First specify() the model relationship, then hypothesize(null = "independence") to declare $$H_0: \beta_1 = 0$$, then generate() reps = 1000 repetitions of the sample using a type = permute, then have it calculate(stat = "slope"). What does it show you?

### Part C

Continue the pipeline from part B, next have it get_p_value(). Inside this function, set obs_stat equal to our $$\hat{\beta_1}$$ we found, and set direction = "both" to run a two-sided test, since our alternative hypothesis is two-sided, $$H_1: \beta_1 \neq 0$$.

### Part D

Instead of get_p_value(), pipe into visualize(obs_stat = ... , direction = "both"). where ... is our estimated $$\hat{\beta_1}$$.