---
title: "2.7 — Inference for Regression — R Practice"
author: "YOUR NAME HERE"
date: "`r Sys.Date()`"
output:
  html_document:
    df_print: paged
    theme: simplex
    toc: true 
    toc_depth: 3
    toc_float: true
    code_folding: show
    highlight: tango
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(20) # using this number means all "random" generated objects will be identical for all of us!
```

## Question 1
Let’s use the `diamonds` data built into `ggplot`. Simply load `tidyverse` and then to be clear, save this as a tibble (feel free to rename it) with `diamonds <- diamonds`.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

## Question 2
Suppose we want to estimate the following relationship:

$$\text{price}_i = \beta_0 + \beta_1 \text{carat}_i + u_i$$

Run a regression of `price` on `carat` using `lm()` and get a `summary`.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part A

What is $\hat{\beta_1}$? Interpret it in the context of our regression.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part B
Use `broom`'s `tidy()` command, and calculate a confidence interval by including `conf.int = T` inside `tidy()`. What is the 95% confidence interval for $\hat{\beta_1}$, and what does it mean? Save these endpoints as an object.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

## Question 3
Now let’s use `infer`. Install it if you don’t have it, then load it.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part A

Let’s generate a confidence interval. First `specify()` the model relationship, then `generate()` `reps = 1000` repetitions of the sample using a `type = bootstrap`, then have it `calculate(stat = "slope")`.^[Note this will take a few minutes, its doing a lot of calculations!] What does it show you?

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part B

Continue the pipeline from part A, next have it `get_confidence_interval()`. Set `level = 0.95, type = "se"` and `point_estimate` equal to our estimated $\hat{\beta_1}$ from Question 2.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part C
Now instead of `get_confidence_interval()`, pipe into `visualize()` to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add `+shade_ci(endpoints = ...)` setting the argument equal to whatever you called your object containing the confidence interval.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

## Question 4

Now let’s test the following hypothesis:

$$\begin{align*}
H_0: \beta_1 &= 0\\
H_1: \beta_1 &\neq 0\\
\end{align*}$$

### Part A

What does the output of `summary` or of `tidy` from Question 2 tell you?

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part B
Let’s now do this with `infer`. First `specify()` the model relationship, then `hypothesize(null = "independence")` to declare $H_0: \beta_1 = 0$, then `generate()` `reps = 1000` repetitions of the sample using a `type = permute`, then have it `calculate(stat = "slope")`. What does it show you?

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part C
Continue the pipeline from part B, next have it `get_p_value()`. Inside this function, set `obs_stat` equal to our $\hat{\beta_1}$ we found, and set `direction = "both"` to run a two-sided test, since our alternative hypothesis is two-sided, $H_1: \beta_1 \neq 0$.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```

---

### Part D

Instead of `get_p_value()`, pipe into `visualize(obs_stat = ... , direction = "both").` where `...` is our estimated $\hat{\beta_1}$.

---

<!--ANSWER BELOW HERE-->

```{r}
# PUT CODE HERE
```