---
title: "2.7 — Inference for Regression — R Practice"
author: "YOUR NAME HERE"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
theme: simplex
toc: true
toc_depth: 3
toc_float: true
code_folding: show
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(20) # using this number means all "random" generated objects will be identical for all of us!
```
## Question 1
Let’s use the `diamonds` data built into `ggplot`. Simply load `tidyverse` and then to be clear, save this as a tibble (feel free to rename it) with `diamonds <- diamonds`.
---
```{r}
# PUT CODE HERE
```
---
## Question 2
Suppose we want to estimate the following relationship:
$$\text{price}_i = \beta_0 + \beta_1 \text{carat}_i + u_i$$
Run a regression of `price` on `carat` using `lm()` and get a `summary`.
---
```{r}
# PUT CODE HERE
```
---
### Part A
What is $\hat{\beta_1}$? Interpret it in the context of our regression.
---
```{r}
# PUT CODE HERE
```
---
### Part B
Use `broom`'s `tidy()` command, and calculate a confidence interval by including `conf.int = T` inside `tidy()`. What is the 95% confidence interval for $\hat{\beta_1}$, and what does it mean? Save these endpoints as an object.
---
```{r}
# PUT CODE HERE
```
---
## Question 3
Now let’s use `infer`. Install it if you don’t have it, then load it.
---
```{r}
# PUT CODE HERE
```
---
### Part A
Let’s generate a confidence interval. First `specify()` the model relationship, then `generate()` `reps = 1000` repetitions of the sample using a `type = bootstrap`, then have it `calculate(stat = "slope")`.^[Note this will take a few minutes, its doing a lot of calculations!] What does it show you?
---
```{r}
# PUT CODE HERE
```
---
### Part B
Continue the pipeline from part A, next have it `get_confidence_interval()`. Set `level = 0.95, type = "se"` and `point_estimate` equal to our estimated $\hat{\beta_1}$ from Question 2.
---
```{r}
# PUT CODE HERE
```
---
### Part C
Now instead of `get_confidence_interval()`, pipe into `visualize()` to see the distribution. If you saved the confidence interval endpoints from part 1B, you can finally add `+shade_ci(endpoints = ...)` setting the argument equal to whatever you called your object containing the confidence interval.
---
```{r}
# PUT CODE HERE
```
---
## Question 4
Now let’s test the following hypothesis:
$$\begin{align*}
H_0: \beta_1 &= 0\\
H_1: \beta_1 &\neq 0\\
\end{align*}$$
### Part A
What does the output of `summary` or of `tidy` from Question 2 tell you?
---
```{r}
# PUT CODE HERE
```
---
### Part B
Let’s now do this with `infer`. First `specify()` the model relationship, then `hypothesize(null = "independence")` to declare $H_0: \beta_1 = 0$, then `generate()` `reps = 1000` repetitions of the sample using a `type = permute`, then have it `calculate(stat = "slope")`. What does it show you?
---
```{r}
# PUT CODE HERE
```
---
### Part C
Continue the pipeline from part B, next have it `get_p_value()`. Inside this function, set `obs_stat` equal to our $\hat{\beta_1}$ we found, and set `direction = "both"` to run a two-sided test, since our alternative hypothesis is two-sided, $H_1: \beta_1 \neq 0$.
---
```{r}
# PUT CODE HERE
```
---
### Part D
Instead of `get_p_value()`, pipe into `visualize(obs_stat = ... , direction = "both").` where `...` is our estimated $\hat{\beta_1}$.
---
```{r}
# PUT CODE HERE
```