2.7 — Inference for Regression

ECON 480 • Econometrics • Fall 2020

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF20
metricsF20.classes.ryansafner.com

Outline

Hypothesis Testing

Digression: p-Values and the Philosophy of Science

# Hypothesis Testing by Simulation, with infer

What R Calculates (Classical Statistical Inference)

# The Use and Abuse of $p$-values

Hypothesis Testing

Estimation and Hypothesis Testing IWe want to test if our estimates are statistically significant and they describe the populationThis is the "bread and butter" of inferential statistics and the purpose of regression

   

Estimation and Hypothesis Testing I

We want to test if our estimates are statistically significant and they describe the population
- This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

Does reducing class size actually improve test scores?
Do more years of education increase your wages?
Is the gender wage gap between men and women really $0.77?

Estimation and Hypothesis Testing I

We want to test if our estimates are statistically significant and they describe the population
- This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

Does reducing class size actually improve test scores?
Do more years of education increase your wages?
Is the gender wage gap between men and women really $0.77?

All modern science is built upon statistical hypothesis testing, so understand it well!

Estimation and Hypothesis Testing II

Note, we can test a lot of hypotheses about a lot of population parameters, e.g.
- A population mean $\mu$
  - Example: average height of adults
- A population proportion $p$
  - Example: percent of voters who voted for Trump
- A difference in population means $\mu_A-\mu_B$
  - Example: difference in average wages of men vs. women
- A difference in population proportions $p_A-p_B$
  - Example: difference in percent of patients reporting symptoms of drug A vs B
We will focus on hypotheses about population regression slope $(\hat{\beta}_1)$, i.e. the causal effect^† of $X$ on $Y$

^† With a model this simple, it's almost certainly not causal, but this is the ultimate direction we are heading...

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis \((H_0)\) that proposes a specific value of a population parameterNotation: add a subscript 0: \(\beta_{1,0}\) (or \(\mu_0\), \(p_0\), etc)

   

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis \((H_0)\) that proposes a specific value of a population parameterNotation: add a subscript 0: \(\beta_{1,0}\) (or \(\mu_0\), \(p_0\), etc)

We suggest an alternative hypothesis \((H_a)\), often the one we hope to verifyNote, can be multiple alternative hypotheses: \(H_1, H_2, \ldots , H_n\)

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis \((H_0)\) that proposes a specific value of a population parameterNotation: add a subscript 0: \(\beta_{1,0}\) (or \(\mu_0\), \(p_0\), etc)

We suggest an alternative hypothesis \((H_a)\), often the one we hope to verifyNote, can be multiple alternative hypotheses: \(H_1, H_2, \ldots , H_n\)

Ask: "Does our data (sample) give us sufficient evidence to reject \(H_0\) in favor of \(H_a\)?"Note: the test is always about \(\mathbf{H_0}\)! 
See if we have sufficient evidence to reject the status quo

Null and Alternative Hypotheses IINull hypothesis assigns a value (or a range) to a population parametere.g. \(\beta_1=2\) or \(\beta_1 \leq 20\)
Most common is \(\beta_1=0\) \(\implies\) \(X\) has no effect on \(Y\) (no slope for a line)
Note: always an equality!

   

Null and Alternative Hypotheses II

Null hypothesis assigns a value (or a range) to a population parameter
- e.g. $\beta_1=2$ or $\beta_1 \leq 20$
- Most common is $\beta_1=0$ $\implies$ $X$ has no effect on $Y$ (no slope for a line)
- Note: always an equality!
Alternative hypothesis must mathematically contradict the null hypothesis
- e.g. $\beta_1 \neq 2$ or $\beta_1 > 20$ or $\beta_1 \neq 0$
- Note: always an inequality!

Null and Alternative Hypotheses II

Null hypothesis assigns a value (or a range) to a population parameter
- e.g. $\beta_1=2$ or $\beta_1 \leq 20$
- Most common is $\beta_1=0$ $\implies$ $X$ has no effect on $Y$ (no slope for a line)
- Note: always an equality!
Alternative hypothesis must mathematically contradict the null hypothesis
- e.g. $\beta_1 \neq 2$ or $\beta_1 > 20$ or $\beta_1 \neq 0$
- Note: always an inequality!

Alternative hypotheses come in two forms:
1. One-sided alternative: $\beta_1 >H_0$ or $\beta_1< H_0$
2. Two-sided alternative: $\beta_1 \neq H_0$
  - Note this means either $\beta_1 < H_0$ or $\beta_1 > H_0$

Components of a Valid Hypothesis TestAll statistical hypothesis tests have the following components:
   

Components of a Valid Hypothesis TestAll statistical hypothesis tests have the following components:
A null hypothesis, \(H_0\)
   

Components of a Valid Hypothesis Test

All statistical hypothesis tests have the following components:

A null hypothesis, $H_0$
An alternative hypothesis, $H_a$

Components of a Valid Hypothesis Test

All statistical hypothesis tests have the following components:

A null hypothesis, $H_0$
An alternative hypothesis, $H_a$
A test statistic to determine if we reject $H_0$ when the statistic reaches a "critical value"
- Beyond the critical value is the "rejection region", sufficient evidence to reject $H_0$

Components of a Valid Hypothesis Test

All statistical hypothesis tests have the following components:

A null hypothesis, $H_0$
An alternative hypothesis, $H_a$
A test statistic to determine if we reject $H_0$ when the statistic reaches a "critical value"
- Beyond the critical value is the "rejection region", sufficient evidence to reject $H_0$
A conclusion whether or not to reject $H_0$ in favor of $H_a$

Type I and Type II Errors I

Sample statistic $(\hat{\beta_1})$ will rarely be exactly equal to the hypothesized parameter $(\beta_1)$
Difference between observed statistic and true parameter could be because:
Parameter is not the hypothesized value
- $H_0$ is false
Parameter is truly hypothesized value but sampling variability gave us a different estimate
- $H_0$ is true
We cannot distinguish between these two possibilities with any certainty

Type I and Type II Errors II

We can interpret our estimates probabilistically as commiting one of two types of error:

Type I error (false positive): rejecting $H_0$ when it is in fact true
- Believing we found an important result when there is truly no relationship
Type II error (false negative): failing to reject $H_0$ when it is in fact false
- Believing we found nothing when there was truly a relationship to find

Type I and Type II Errors III
 
Truth

    Null is True 
    Null is False 
  
    Judgment 
    Reject Null 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Don't Reject Null 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Depending on context, committing one type of error may be more serious than the other

	Truth
Judgment	Reject Null	TYPE I ERROR	CORRECT
(False +)	(True +)
Don't Reject Null	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors IV
 
Truth

    Defendant is Innocent 
    Defendant is Guilty 
  
    Judgment 
    Convict 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Acquit 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Anglo-American common law presumes defendant is innocent: \(H_0\)

	Truth
Judgment	Convict	TYPE I ERROR	CORRECT
(False +)	(True +)
Acquit	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors IV
 
Truth

    Defendant is Innocent 
    Defendant is Guilty 
  
    Judgment 
    Convict 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Acquit 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Anglo-American common law presumes defendant is innocent: \(H_0\)

Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent

	Truth
Judgment	Convict	TYPE I ERROR	CORRECT
(False +)	(True +)
Acquit	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors IV
 
Truth

    Defendant is Innocent 
    Defendant is Guilty 
  
    Judgment 
    Convict 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Acquit 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Anglo-American common law presumes defendant is innocent: \(H_0\)

Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent

If highly improbable: sufficient evidence to reject \(H_0\) and convictBeyond a “reasonable doubt” that the defendant is innocent

	Truth
Judgment	Convict	TYPE I ERROR	CORRECT
(False +)	(True +)
Acquit	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors V

William Blackstone

(1723-1780)

"It is better that ten guilty persons escape than that one innocent suffer."

Type I error is worse than a Type II error in law!

Blackstone, William, 1765-1770, Commentaries on the Laws of England

Type I and Type II Errors VI

Significance Level, $\alpha$, and Confidence Level $1-\alpha$

The significance level, $\alpha$, is the probability of a Type I error

$$\alpha=P(\text{Reject } H_0 | H_0 \text{ is true})$$

Significance Level, $\alpha$, and Confidence Level $1-\alpha$

The significance level, $\alpha$, is the probability of a Type I error

$$\alpha=P(\text{Reject } H_0 | H_0 \text{ is true})$$

The confidence level is defined as $(1-\alpha)$
- Specify in advance an $\alpha$-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)

Significance Level, $\alpha$, and Confidence Level $1-\alpha$

The significance level, $\alpha$, is the probability of a Type I error

$$\alpha=P(\text{Reject } H_0 | H_0 \text{ is true})$$

The confidence level is defined as $(1-\alpha)$
- Specify in advance an $\alpha$-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)
The probability of a Type II error is defined as $\beta$:

$$\beta=P(\text{Don't reject } H_0 | H_0 \text{ is false})$$

\(\alpha\) and \(\beta\)
 
Truth

    Null is True 
    Null is False 
  
    Judgment 
    Reject Null 
    TYPE I ERROR 
    CORRECT 
  
    α 
    (1-β) 
  
    Don't Reject Null 
    CORRECT 
    TYPE II ERROR 
  
    (1-α) 
    β

	Truth
Judgment	Reject Null	TYPE I ERROR	CORRECT
α	(1-β)
Don't Reject Null	CORRECT	TYPE II ERROR
(1-α)	β

Power and p-values

The statistical power of the test is $(1-\beta)$: the probability of correctly rejecting $H_0$ when $H_0$ is in fact false (e.g. not convicting an innocent person)

$$\text{Power} = 1- \beta = P(\text{Reject }H_0|H_0 \text{ is false})$$

Power and p-values

The statistical power of the test is $(1-\beta)$: the probability of correctly rejecting $H_0$ when $H_0$ is in fact false (e.g. not convicting an innocent person)

$$\text{Power} = 1- \beta = P(\text{Reject }H_0|H_0 \text{ is false})$$

The $p$-value or significance probability is the probability that, if the null hypothesis were true, the test statistic from any sample will be at least as extreme as the test statistic from our sample

$$p(\delta \geq \delta_i|H_0 \text{ is true})$$

where $\delta$ represents some test statistic
$\delta_i$ is the test statistic we observe in our sample
More on this in a bit

p-Values and Statistical Significance

After running our test, we need to make a decision between the competing hypotheses
Compare $p$-value with pre-determined $\alpha$ (commonly, $\alpha=0.05$, 95% confidence level)
If $p<\alpha$: statistically significant evidence sufficient to reject $H_0$ in favor of $H_a$
- Note this does not mean $H_a$ is true! We merely have rejected $H_0$!
If $p \geq \alpha$: insufficient evidence to reject $H_0$
- Note this does not mean $H_0$ is true! We merely have failed to reject $H_0$!

Digression: p-Values and the Philosophy of Science

Hypothesis Testing and the Philosophy of Science I

Sir Ronald A. Fisher

(1890—1962)

"The null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."

1931, The Design of Experiments

Hypothesis Testing and the Philosophy of Science I

Modern philosophy of science is largely based off of hypothesis testing and falsifiability, which form the "Scientific Method"^†
For something to be "scientific", it must be falsifiable, or at least testable
Hypotheses can be corroborated with evidence, but always tentative until falsified by data in suggesting an alternative hypothesis

"All swans are white" is a hypothesis rejected upon discovery of a single black swan

^† Note: economics is a very different kind of "science" with a different methodology!

Hypothesis Testing and p-Values

Hypothesis testing, confidence intervals, and p-values are probably the hardest thing to understand in statistics

Fivethirtyeight: Not Even Scientists Can Easily Explain P-values

Hypothesis Testing: Which Test? I

Rigorous course on statistics (ECMG 212 or MATH 112) will spend weeks going through different types of tests:
- Sample mean; difference of means
- Proportion; difference of proportions
- Z-test vs t-test
- 1 sample vs. 2 samples
- $\chi^2$ test
See today's class notes page for more

Hypothesis Testing: Which Test? II

There is Only One Test

Fortunately, some clever statisticians realized "there is only one test" and built a nice R package called infer

Calculate a statistic, $\delta_i$^†, from a sample of data
Simulate a world where $\delta$ is null $(H_0)$
Examine the distribution of $\delta$ across the null world
Calculate the probability that $\delta_i$ could exist in the null world
Decide if $\delta_i$ is statistically significant

^† $\delta$ can stand in for any test-statistic in any hypothesis test! For our purposes, $\delta$ is the slope of our regression sample, $\hat{\beta}_1$.

Elements of a Hypothesis Test