+ - 0:00:00
Notes for current slide
Notes for next slide

2.7 — Inference for Regression

ECON 480 • Econometrics • Fall 2020

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF20
metricsF20.classes.ryansafner.com

Hypothesis Testing

Estimation and Hypothesis Testing I

  • We want to test if our estimates are statistically significant and they describe the population
    • This is the "bread and butter" of inferential statistics and the purpose of regression

Estimation and Hypothesis Testing I

  • We want to test if our estimates are statistically significant and they describe the population
    • This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

  • Does reducing class size actually improve test scores?
  • Do more years of education increase your wages?
  • Is the gender wage gap between men and women really $0.77?

Estimation and Hypothesis Testing I

  • We want to test if our estimates are statistically significant and they describe the population
    • This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

  • Does reducing class size actually improve test scores?
  • Do more years of education increase your wages?
  • Is the gender wage gap between men and women really $0.77?
  • All modern science is built upon statistical hypothesis testing, so understand it well!

Estimation and Hypothesis Testing II

  • Note, we can test a lot of hypotheses about a lot of population parameters, e.g.

    • A population mean μ
      • Example: average height of adults
    • A population proportion p
      • Example: percent of voters who voted for Trump
    • A difference in population means μAμB
      • Example: difference in average wages of men vs. women
    • A difference in population proportions pApB
      • Example: difference in percent of patients reporting symptoms of drug A vs B
  • We will focus on hypotheses about population regression slope (ˆβ1), i.e. the causal effect of X on Y

With a model this simple, it's almost certainly not causal, but this is the ultimate direction we are heading...

Null and Alternative Hypotheses I

  • All scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameter
    • Notation: add a subscript 0: β1,0 (or μ0, p0, etc)

Null and Alternative Hypotheses I

  • All scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameter
    • Notation: add a subscript 0: β1,0 (or μ0, p0, etc)
  • We suggest an alternative hypothesis (Ha), often the one we hope to verify
    • Note, can be multiple alternative hypotheses: H1,H2,,Hn

Null and Alternative Hypotheses I

  • All scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameter
    • Notation: add a subscript 0: β1,0 (or μ0, p0, etc)
  • We suggest an alternative hypothesis (Ha), often the one we hope to verify
    • Note, can be multiple alternative hypotheses: H1,H2,,Hn
  • Ask: "Does our data (sample) give us sufficient evidence to reject H0 in favor of Ha?"
    • Note: the test is always about H0!
    • See if we have sufficient evidence to reject the status quo

Null and Alternative Hypotheses II

  • Null hypothesis assigns a value (or a range) to a population parameter
    • e.g. β1=2 or β120
    • Most common is β1=0 X has no effect on Y (no slope for a line)
    • Note: always an equality!

Null and Alternative Hypotheses II

  • Null hypothesis assigns a value (or a range) to a population parameter

    • e.g. β1=2 or β120
    • Most common is β1=0 X has no effect on Y (no slope for a line)
    • Note: always an equality!
  • Alternative hypothesis must mathematically contradict the null hypothesis

    • e.g. β12 or β1>20 or β10
    • Note: always an inequality!

Null and Alternative Hypotheses II

  • Null hypothesis assigns a value (or a range) to a population parameter

    • e.g. β1=2 or β120
    • Most common is β1=0 X has no effect on Y (no slope for a line)
    • Note: always an equality!
  • Alternative hypothesis must mathematically contradict the null hypothesis

    • e.g. β12 or β1>20 or β10
    • Note: always an inequality!
  • Alternative hypotheses come in two forms:
    1. One-sided alternative: β1>H0 or β1<H0
    2. Two-sided alternative: β1H0
      • Note this means either β1<H0 or β1>H0

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:
  1. A null hypothesis, H0

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:
  1. A null hypothesis, H0

  2. An alternative hypothesis, Ha

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:
  1. A null hypothesis, H0

  2. An alternative hypothesis, Ha

  3. A test statistic to determine if we reject H0 when the statistic reaches a "critical value"

    • Beyond the critical value is the "rejection region", sufficient evidence to reject H0

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:
  1. A null hypothesis, H0

  2. An alternative hypothesis, Ha

  3. A test statistic to determine if we reject H0 when the statistic reaches a "critical value"

    • Beyond the critical value is the "rejection region", sufficient evidence to reject H0
  4. A conclusion whether or not to reject H0 in favor of Ha

Type I and Type II Errors I

  • Sample statistic (^β1) will rarely be exactly equal to the hypothesized parameter (β1)

  • Difference between observed statistic and true parameter could be because:

  • Parameter is not the hypothesized value

    • H0 is false
  • Parameter is truly hypothesized value but sampling variability gave us a different estimate

    • H0 is true
  • We cannot distinguish between these two possibilities with any certainty

Type I and Type II Errors II

  • We can interpret our estimates probabilistically as commiting one of two types of error:
  1. Type I error (false positive): rejecting H0 when it is in fact true

    • Believing we found an important result when there is truly no relationship
  2. Type II error (false negative): failing to reject H0 when it is in fact false

    • Believing we found nothing when there was truly a relationship to find

Type I and Type II Errors III

Truth
Null is True Null is False
Judgment Reject Null TYPE I ERROR CORRECT
(False +) (True +)
Don't Reject Null CORRECT TYPE II ERROR
(True -) (False -)
  • Depending on context, committing one type of error may be more serious than the other

Type I and Type II Errors IV

Truth
Defendant is Innocent Defendant is Guilty
Judgment Convict TYPE I ERROR CORRECT
(False +) (True +)
Acquit CORRECT TYPE II ERROR
(True -) (False -)
  • Anglo-American common law presumes defendant is innocent: H0

Type I and Type II Errors IV

Truth
Defendant is Innocent Defendant is Guilty
Judgment Convict TYPE I ERROR CORRECT
(False +) (True +)
Acquit CORRECT TYPE II ERROR
(True -) (False -)
  • Anglo-American common law presumes defendant is innocent: H0
  • Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent

Type I and Type II Errors IV

Truth
Defendant is Innocent Defendant is Guilty
Judgment Convict TYPE I ERROR CORRECT
(False +) (True +)
Acquit CORRECT TYPE II ERROR
(True -) (False -)
  • Anglo-American common law presumes defendant is innocent: H0
  • Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent
  • If highly improbable: sufficient evidence to reject H0 and convict
    • Beyond a “reasonable doubt” that the defendant is innocent

Type I and Type II Errors V

William Blackstone

(1723-1780)

"It is better that ten guilty persons escape than that one innocent suffer."

  • Type I error is worse than a Type II error in law!

Blackstone, William, 1765-1770, Commentaries on the Laws of England

Type I and Type II Errors VI

Type I and Type II Errors VI

Significance Level, α, and Confidence Level 1α

  • The significance level, α, is the probability of a Type I error

α=P(Reject H0|H0 is true)

Significance Level, α, and Confidence Level 1α

  • The significance level, α, is the probability of a Type I error

α=P(Reject H0|H0 is true)

  • The confidence level is defined as (1α)
    • Specify in advance an α-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)

Significance Level, α, and Confidence Level 1α

  • The significance level, α, is the probability of a Type I error

α=P(Reject H0|H0 is true)

  • The confidence level is defined as (1α)

    • Specify in advance an α-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)
  • The probability of a Type II error is defined as β:

β=P(Don't reject H0|H0 is false)

α and β

Truth
Null is True Null is False
Judgment Reject Null TYPE I ERROR CORRECT
α (1-β)
Don't Reject Null CORRECT TYPE II ERROR
(1-α) β

Power and p-values

  • The statistical power of the test is (1β): the probability of correctly rejecting H0 when H0 is in fact false (e.g. not convicting an innocent person)

Power=1β=P(Reject H0|H0 is false)

Power and p-values

  • The statistical power of the test is (1β): the probability of correctly rejecting H0 when H0 is in fact false (e.g. not convicting an innocent person)

Power=1β=P(Reject H0|H0 is false)

  • The p-value or significance probability is the probability that, if the null hypothesis were true, the test statistic from any sample will be at least as extreme as the test statistic from our sample

p(δδi|H0 is true)

  • where δ represents some test statistic
  • δi is the test statistic we observe in our sample
  • More on this in a bit

p-Values and Statistical Significance

  • After running our test, we need to make a decision between the competing hypotheses

  • Compare p-value with pre-determined α (commonly, α=0.05, 95% confidence level)

  • If p<α: statistically significant evidence sufficient to reject H0 in favor of Ha

    • Note this does not mean Ha is true! We merely have rejected H0!
  • If pα: insufficient evidence to reject H0

    • Note this does not mean H0 is true! We merely have failed to reject H0!

Digression: p-Values and the Philosophy of Science

Hypothesis Testing and the Philosophy of Science I

Sir Ronald A. Fisher

(1890—1962)

"The null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."

1931, The Design of Experiments

Hypothesis Testing and the Philosophy of Science I

  • Modern philosophy of science is largely based off of hypothesis testing and falsifiability, which form the "Scientific Method"

  • For something to be "scientific", it must be falsifiable, or at least testable

  • Hypotheses can be corroborated with evidence, but always tentative until falsified by data in suggesting an alternative hypothesis

"All swans are white" is a hypothesis rejected upon discovery of a single black swan

Note: economics is a very different kind of "science" with a different methodology!

Hypothesis Testing and p-Values

  • Hypothesis testing, confidence intervals, and p-values are probably the hardest thing to understand in statistics

Hypothesis Testing: Which Test? I

  • Rigorous course on statistics (ECMG 212 or MATH 112) will spend weeks going through different types of tests:

    • Sample mean; difference of means
    • Proportion; difference of proportions
    • Z-test vs t-test
    • 1 sample vs. 2 samples
    • χ2 test
  • See today's class notes page for more

Hypothesis Testing: Which Test? II

There is Only One Test

  1. Calculate a statistic, δi, from a sample of data

  2. Simulate a world where δ is null (H0)

  3. Examine the distribution of δ across the null world

  4. Calculate the probability that δi could exist in the null world

  5. Decide if δi is statistically significant

δ can stand in for any test-statistic in any hypothesis test! For our purposes, δ is the slope of our regression sample, ˆβ1.

Elements of a Hypothesis Test

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model

  2. hypothesize() the null

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model

  2. hypothesize() the null

  3. generate() simulations of the null world

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model

  2. hypothesize() the null

  3. generate() simulations of the null world

  4. calculate() the p-value

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β10

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model

  2. hypothesize() the null

  3. generate() simulations of the null world

  4. calculate() the p-value

  5. visualize() with a histogram (optional)

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Classical Inference: Critical Values of Test Statistic

  • Test statistic (δ): measures how far what we observed in our sample (^β1) is from what we would expect if the null hypothesis were true (β1=0)

    • Calculated from a sampling distribution of the estimator (i.e. ^β1)
    • In econometrics, we use t-distributions which have nk1 degrees of freedom
  • Rejection region: if the test statistic reaches a "critical value" of δ, then we reject the null hypothesis

Again, see today's class notes for more on the t-distribution. k is the number of independent variables our model has, in this case, with just one X, k=1. We use two degrees of freedom to calculate ^β0 and ^β1, hence we have n2 df.

Hypothesis Testing by Simulation, with infer

Imagine a Null World, where H0 is True

Our world, and a world where β1=0 by assumption.

Comparing the Worlds I

  • From that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again

Comparing the Worlds I

  • From that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again

Our Sample

ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
(Intercept)698.932952
str-2.279808

Comparing the Worlds I

  • From that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again

Our Sample

ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
(Intercept)698.932952
str-2.279808

Another Sample

ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
(Intercept)647.8027952
str0.3235038

Comparing the Worlds II

  • From that null world where H0:β1=0 is true, let's simulate 1,000 samples and calculate slope (^β1) for each
ABCDEFGHIJ0123456789
sample
<int>
slope
<dbl>
1-0.3027333296
2-0.3624481355
30.6448518690
4-0.0745971847
50.5969444290
60.5505335318
70.5927466147
80.0572148658
9-0.0989989073
100.8043957511

Prepping the infer Pipeline

  • Before I show you how to do this, let's first save our estimated slope from our actual sample
    • We'll want this later!
# save as obs_slope
sample_slope <- school_reg_tidy %>% # this is the regression tidied with broom
filter(term=="str") %>%
pull(estimate)
# confirm what it is
sample_slope
## [1] -2.279808

The infer Pipeline: Specify

The infer Pipeline: Specify

Specify

data %>%
specify(y ~ x)
  • Take our data and pipe it into the specify() function, which is essentially a lm() function for regression (for our purposes)
CASchool %>%
specify(testscr ~ str)
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
690.817.88991
661.221.52466
643.618.69723
  • Note nothing happens yet

The infer Pipeline: Hypothesize

The infer Pipeline: Hypothesize

Specify

Hypothesize

%>% hypothesize(null = "independence")

  • Describe what the null hypothesis is here
  • In infer's language, we are hypothesizing that str and testscr are independent (β1=0)
CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence")
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
690.817.88991
661.221.52466
643.618.69723
type can be either point (for specific point estimates for a single variable, such as a sample mean, ((\bar{x})), or independence (for hypotheses about two samples or a relationship between variables). See more here.

The infer Pipeline: Generate I

The infer Pipeline: Generate I

Specify

Hypothesize

Generate

%>% generate(reps = n, type = "permute")

  • Now the magic starts, as we run a number of simulated samples
  • Set the number of reps and set the type equal to "permute"
    • we want permutation instead of a bootstrap for hypothesis testing!
CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute")

The infer Pipeline: Generate II

Specify

Hypothesize

Generate

%>% generate(reps = n, type = "permute")

ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
replicate
<int>
693.9517.889911
642.4021.524661
680.4518.697231
672.7017.357141
666.4518.671331
654.2021.406251
671.9519.500001
671.7520.894121
624.5519.947371
699.1020.805561

The infer Pipeline: Calculate I

The infer Pipeline: Calculate I

Specify

Hypothesize

Generate

Calculate

%>% calculate(stat = "")

  • We calculate sample statistics for each of the 1,000 replicate samples

  • In our case, calculate the slope, (^β1) for each replicate

CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "slope")
  • Other stats for calculation: "mean", "median", "prop", "diff in means", "diff in props", etc. (see package information)

The infer Pipeline: Calculate II

Specify

Hypothesize

Generate

Calculate

%>% calculate(stat = "")

ABCDEFGHIJ0123456789
replicate
<int>
stat
<dbl>
10.384783281
20.241700895
30.268799843
4-0.189039951
51.215030315
60.511783627
7-0.457378304
81.008206723
90.092043084
100.233837801

The infer Pipeline: Get p Value

Specify

Hypothesize

Generate

Calculate

Get p Value

%>% get_p_value(obs stat = "", direction = "both")

  • We can calculate the p-value

    • the probability of seeing a value at least as large as our sample_slope (-2.28) in our simulated null distribution
  • Two-sided alternative Ha:β10, we double the raw p-value

CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute") %>%
calculate(stat = "slope") %>%
get_p_value(obs_stat = sample_slope,
direction = "both")
ABCDEFGHIJ0123456789
p_value
<dbl>
0

The infer Pipeline: Visualize I

The infer Pipeline: Visualize I

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()

  • Make a histogram of our null distribution of β1
    • Note it is centered at β1=0 because that's H0!
CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute") %>%
calculate(stat = "slope") %>%
visualize()

The infer Pipeline: Visualize II

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()

  • Add our sample_slope to show our finding on the null distr.
CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute") %>%
calculate(stat = "slope") %>%
visualize(obs_stat = sample_slope)

The infer Pipeline: Visualize p-value

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()+shade_p_value()

  • Add shade_p_value to see what p is
CASchool %>%
specify(testscr ~ str) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000,
type = "permute") %>%
calculate(stat = "slope") %>%
visualize(obs_stat = sample_slope)+
shade_p_value(obs_stat = sample_slope,
direction = "two_sided")

The infer Pipeline: Visualize Confidence Intervals

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()+shade_ci()

  • To shade confidence interval, we first need a vector of what they are
    • I've saved the outputted tibble of them from 4 slides ago as ci_values
simulations %>%
visualize(obs_stat = sample_slope)+
shade_confidence_interval(ci_values)

The infer Pipeline: Visualize is a Wrapper of ggplot

  • infer's visualize() function is just a wrapper function for ggplot()
    • you can take your simulations tibble and just ggplot a normal histogram

The infer Pipeline: Visualize is a Wrapper of ggplot

  • infer's visualize() function is just a wrapper function for ggplot()
    • you can take your simulations tibble and just ggplot a normal histogram
simulations %>%
ggplot(data = .)+
aes(x = stat)+
geom_histogram(color="white", fill="indianred")+
geom_vline(xintercept = sample_slope,
color = "blue",
size = 2,
linetype = "dashed")+
labs(x = expression(paste("Distribution of ", hat(beta[1]), " under ", H[0], " that ", beta[1]==0)),
y = "Samples")+
theme_classic(base_family = "Fira Sans Condensed",
base_size=20)

The infer Pipeline: Visualize is a Wrapper of ggplot

  • infer's visualize() function is just a wrapper function for ggplot()
    • you can take your simulations tibble and just ggplot a normal histogram
simulations %>%
ggplot(data = .)+
aes(x = stat)+
geom_histogram(color="white", fill="indianred")+
geom_vline(xintercept = sample_slope,
color = "blue",
size = 2,
linetype = "dashed")+
labs(x = expression(paste("Distribution of ", hat(beta[1]), " under ", H[0], " that ", beta[1]==0)),
y = "Samples")+
theme_classic(base_family = "Fira Sans Condensed",
base_size=20)

What R Calculates (Classical Statistical Inference)

What R Does: Classical Statistical Inference I

  • R does things the old-fashioned way, using a theoretical null distribution instead of simulation

  • A t-distribution with nk1 df

  • Calculate a t-statistic for ^β1:

test statistic=estimatenull hypothesisstandard error of estimate

k is the number of X variables.

What R Does: Classical Statistical Inference II

test statistic=estimatenull hypothesisstandard error of estimate

  • t has the same interpretation as Z, number of std. dev. away from the distribution's center

  • Compares to a critical value of t (determined by α & nk1)

    • For 95% confidence, α=0.05, t2

Think of our simulated distribution, the center was 0.

The 68-95-99.7% empirical rule!

What R Does: Classical Statistical Inference III

t=^β1β1,0se(^β1)t=2.2800.48t=4.75

  • Our sample slope is 4.75 standard deviations below the mean under H0

  • p-value: prob. of a test statistic at least as large (in magnitude) as ours if the null hypothesis were true

    • p-value is 2-sided for Ha:β10

Think of our simulated distribution, the center was 0.

1-Sided vs. 2-Sided p-values I

Ha:β1<0

p-value: Prob(t<ti)

Ha:β1>0

p-value: Prob(t>ti)

1-Sided vs. 2-Sided p-values I

Ha:β10

p-value: 2×Prob(t>|ti|)

Hypothesis Tests in Regression Output I

summary(school_reg)
##
## Call:
## lm(formula = testscr ~ str, data = CASchool)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.727 -14.251 0.483 12.822 48.540
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 698.9330 9.4675 73.825 < 2e-16 ***
## str -2.2798 0.4798 -4.751 2.78e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.58 on 418 degrees of freedom
## Multiple R-squared: 0.05124, Adjusted R-squared: 0.04897
## F-statistic: 22.58 on 1 and 418 DF, p-value: 2.783e-06

Hypothesis Tests in Regression Output II

  • In broom's tidy() (with confidence intervals)
tidy(school_reg, conf.int=TRUE)
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)698.9329529.467491473.8245146.569925e-242
str-2.2798080.4798256-4.7513272.783307e-06

Conclusions

H0:β1=0Ha:βa0

  • Because the hypothesis test's p-value < α (0.05)...

  • We have sufficient evidence to reject H0 in favor of our alternative hypothesis. Our sample suggests that there is a relationship between class size and test scores.

Conclusions

H0:β1=0Ha:βa0

  • Because the hypothesis test's p-value < α (0.05)...

  • We have sufficient evidence to reject H0 in favor of our alternative hypothesis. Our sample suggests that there is a relationship between class size and test scores.

  • Using the confidence intervals:

  • We are 95% confident that the true marginal effect of class size on test scores is between 3.22 and 1.34.

Hypothesis Testing vs. Confidence Intervals

  • Confidence intervals are all two-sided by nature CI0.95=([^β12×se(^β1)],[^β1+2×se(^β1]))

  • Hypothesis test (t-test) of H0:β1=0 computes a t-value of1 t=^β1se(^β1)

    and p<0.05 when t2

Hypothesis Testing vs. Confidence Intervals

  • Confidence intervals are all two-sided by nature CI0.95=([^β12×se(^β1)],[^β1+2×se(^β1]))

  • Hypothesis test (t-test) of H0:β1=0 computes a t-value of1 t=^β1se(^β1)

    and p<0.05 when t2

  • If a confidence interval contains the H0 value (i.e. 0, for our test), then we fail to reject H0.

1 Since our null hypothesis is that β1,0=0, the test statistic simplifies to this neat fraction.

The Use and Abuse of p-values

Common Misconceptions about p-values

  • So how do we interpret p again?

p is the probability that the alternative hypothesis is false

  • We can never prove an alternative hypothesis, only tentatively reject a null hypothesis

Common Misconceptions about p-values

  • So how do we interpret p again?

p is the probability that the alternative hypothesis is false

  • We can never prove an alternative hypothesis, only tentatively reject a null hypothesis

p is the probability that the null hypothesis is true

  • We're not proving the H0 is false, only saying that it's very unlikely that if H0 were true, we'd obtain a slope as rare as our sample's slope

Common Misconceptions about p-values

  • So how do we interpret p again?

p is the probability that the alternative hypothesis is false

  • We can never prove an alternative hypothesis, only tentatively reject a null hypothesis

p is the probability that the null hypothesis is true

  • We're not proving the H0 is false, only saying that it's very unlikely that if H0 were true, we'd obtain a slope as rare as our sample's slope

p is the probability that our observed effects were produced purely by random chance

  • p is computed under a specific model (think about our null world) that assumes H0 is true

Common Misconceptions about p-values

  • So how do we interpret p again?

p is the probability that the alternative hypothesis is false

  • We can never prove an alternative hypothesis, only tentatively reject a null hypothesis

p is the probability that the null hypothesis is true

  • We're not proving the H0 is false, only saying that it's very unlikely that if H0 were true, we'd obtain a slope as rare as our sample's slope

p is the probability that our observed effects were produced purely by random chance

  • p is computed under a specific model (think about our null world) that assumes H0 is true

p tells us how significant our finding is

  • p tells us nothing about the size or the real world significance of any effect deemed “statistically significant”
  • it only tells us that the slope is statistically significantly different from 0 (if H0 is β1=0)

Abusing p-Values I

Abusing p-Values I

Source: SMBC

Abusing p-Values II

“The widespread use of 'statistical significance' (generally interpreted as (p0.05) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

Wasserstein, Ronald L. and Nicole A. Lazar, (2016), "The ASA's Statement on p-Values: Context, Process, and Purpose," The American Statistician 30(2): 129-133

Abusing p-Values II

“No economist has achieved scientific success as a result of a statistically significant coefficient. Massed observations, clever common sense, elegant theorems, new policies, sagacious economic reasoning, historical perspective, relevant accounting, these have all led to scientific success. Statistical significance has not.”

McCloskey, Dierdre N and Stephen Ziliak, 1996, The Cult of Statistical Significance, p. 112)

p-value Clarification

  • Again, p-value is the probability that, if the null hypothesis were true, we obtain (by pure random chance) a test statistic at least as extreme as the one we estimated for our sample

  • A low p-value means either (and we can't distinguish which):

    1. H0 is true and a highly improbable event has occurred OR
    2. H0 is false

Significance In Regression Tables

Test Score
Intercept698.93 ***
(9.47)   
STR-2.28 ***
(0.48)   
N420       
R-Squared0.05    
SER18.58    
*** p < 0.001; ** p < 0.01; * p < 0.05.
  • Statistical significance is shown by asterisks, common (but not always!) standard:

    • 1 asterisk: significant at α=0.10
    • 2 asterisks: significant at α=0.05
    • 3 asterisks: significant at α=0.01
  • Rare, but sometimes regression tables include p-values for estimates

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow