3.9 — Logarithmic Regression

ECON 480 • Econometrics • Fall 2020

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF20
metricsF20.classes.ryansafner.com

Outline

Natural Logarithms

Linear-Log Model

Log-Linear Model

Log-Log Model

Comparing Across Units

Joint Hypothesis Testing

Nonlinearities

Consider the gapminder example

Nonlinearities

Consider the gapminder example

Nonlinearities

Consider the gapminder example

Nonlinearities

Consider the gapminder example

Natural Logarithms

Logarithmic Models

Another useful model for nonlinear data is the logarithmic model^†
- We transform either , , or both by taking the (natural) logarithm
Logarithmic model has two additional advantages
1. We can easily interpret coefficients as percentage changes or elasticities
2. Useful economic shape: diminishing returns (production functions, utility functions, etc)

^† Don’t confuse this with a logistic (logit) model for dependent dummy variables.

The Natural Logarithm

The exponential function, or , where base
Natural logarithm is the inverse,

The Natural Logarithm: Review IExponents are defined as
bn=b×b×⋯×b⏟n times
where base b is multiplied by itself n times

   

The Natural Logarithm: Review IExponents are defined as
bn=b×b×⋯×b⏟n times
where base b is multiplied by itself n times

Example: 23=2×2×2⏟n=3=8

The Natural Logarithm: Review IExponents are defined as
bn=b×b×⋯×b⏟n times
where base b is multiplied by itself n times

Example: 23=2×2×2⏟n=3=8

Logarithms are the inverse, defined as the exponents in the expressions above
If bn=y, then logb(y)=n
n is the number you must raise b to in order to get y

   

The Natural Logarithm: Review IExponents are defined as
bn=b×b×⋯×b⏟n times
where base b is multiplied by itself n times

Example: 23=2×2×2⏟n=3=8

Logarithms are the inverse, defined as the exponents in the expressions above
If bn=y, then logb(y)=n
n is the number you must raise b to in order to get y

Example: log2(8)=3

   

The Natural Logarithm: Review IILogarithms can have any base, but common to use the natural logarithm (ln) with base e=2.71828...If en=y, then ln(y)=n
   

The Natural Logarithm: Properties

Natural logs have a lot of useful properties:

The Natural Logarithm: Example

Most useful property: for small change in , :

The Natural Logarithm: Example

Most useful property: for small change in , :

Example: Let and , relative change is:

The logged difference:

The Natural Logarithm: Example

Most useful property: for small change in , :

Example: Let and , relative change is:

The logged difference:

This allows us to very easily interpret coefficients as percent changes or elasticities

ElasticityAn elasticity between any two variables, ϵY,X describes the responsiveness (in %) of one variable (Y) to a change in another (X)

   

Elasticity

An elasticity between any two variables, describes the responsiveness (in %) of one variable to a change in another

Elasticity

An elasticity between any two variables, describes the responsiveness (in %) of one variable to a change in another

Numerator is relative change in , Denominator is relative change in

Elasticity

An elasticity between any two variables, describes the responsiveness (in %) of one variable to a change in another

Numerator is relative change in , Denominator is relative change in

Interpretation: a 1% change in will cause a % chang in

Math FYI: Cobb Douglas Functions and LogsOne of the (many) reasons why economists love Cobb-Douglas functions:
Y=ALαKβ
   

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions:
Taking logs, relationship becomes linear:

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions:
Taking logs, relationship becomes linear:

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions:
Taking logs, relationship becomes linear:

With data on (Y,L,K) and linear regression, can estimate α and β
- α: elasticity of Y with respect to L
  - A 1% change in will lead to an % change in
- β: elasticity of Y with respect to K
  - A 1% change in will lead to a % change in

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function:

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function:

Taking logs:

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function:

Taking logs:

A 1% change in will yield a 0.75% change in output
A 1% change in will yield a 0.25% change in output

Logarithms in R IThe log() function can easily take the logarithm
gapminder <- gapminder %>%
  mutate(loggdp = log(gdpPercap)) # log GDP per capita
gapminder %>% head() # look at it
ABCDEFGHIJ0123456789
country
<fctr>
continent
<fctr>
year
<int>
lifeExp
<dbl>
pop
<int>
gdpPercap
<dbl>
loggdp
<dbl>
AfghanistanAsia195228.8018425333779.44536.658583
AfghanistanAsia195730.3329240934820.85306.710344
AfghanistanAsia196231.99710267083853.10076.748878
AfghanistanAsia196734.02011537966836.19716.728864
AfghanistanAsia197236.08813079460739.98116.606625
AfghanistanAsia197738.43814880372786.11346.667101
6 rows
   

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>	loggdp <dbl>
Afghanistan	Asia	1952	28.801	8425333	779.4453	6.658583
Afghanistan	Asia	1957	30.332	9240934	820.8530	6.710344
Afghanistan	Asia	1962	31.997	10267083	853.1007	6.748878
Afghanistan	Asia	1967	34.020	11537966	836.1971	6.728864
Afghanistan	Asia	1972	36.088	13079460	739.9811	6.606625
Afghanistan	Asia	1977	38.438	14880372	786.1134	6.667101

Logarithms in R II

Note, log() by default is the natural logarithm , i.e. base e
- Can change base with e.g. log(x, base = 5)
- Some common built-in logs: log10, log2

log10(100)

## [1] 2

log2(16)

## [1] 4

log(19683, base=3)

## [1] 9

Logarithms in R IIINote when running a regression, you can pre-transform the data into logs (as I did above), or just add log() around a variable in the regression
lm(lifeExp ~ loggdp,
   data = gapminder) %>%
  tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
(Intercept)-9.1008891.227674-7.413117
loggdp8.4050850.14876256.500206
2 rows | 1-4 of 5 columns
lm(lifeExp ~ log(gdpPercap),
   data = gapminder) %>%
  tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
(Intercept)-9.1008891.227674-7.413117
log(gdpPercap)8.4050850.14876256.500206
2 rows | 1-4 of 5 columns
   

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log
   

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log
Linear-log model: Yi=β0+β1lnXi
   

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model:
Log-linear model:

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model:
Log-linear model:
Log-log model:

Linear-Log Model

Linear-Log ModelLinear-log model has an independent variable (X) that is logged
   

Linear-Log Model

Linear-log model has an independent variable that is logged

Linear-Log Model

Linear-log model has an independent variable that is logged

Marginal effect of : a 1% change in a unit change in

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder)
library(broom)
lin_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	-9.100889	1.227674	-7.413117
loggdp	8.405085	0.148762	56.500206

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder)
library(broom)
lin_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	-9.100889	1.227674	-7.413117
loggdp	8.405085	0.148762	56.500206

A 1% change in GDP a 0.0941 year increase in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder)
library(broom)
lin_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	-9.100889	1.227674	-7.413117
loggdp	8.405085	0.148762	56.500206

A 1% change in GDP a 0.0941 year increase in Life Expectancy
A 25% fall in GDP a 2.353 year decrease in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder)
library(broom)
lin_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	-9.100889	1.227674	-7.413117
loggdp	8.405085	0.148762	56.500206

A 1% change in GDP a 0.0941 year increase in Life Expectancy
A 25% fall in GDP a 2.353 year decrease in Life Expectancy
A 100% rise in GDP a 9.041 year increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdpPercap,
      y = lifeExp)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm",
              formula=y~log(x),
              color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120000,20000))+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Linear-Log Model Graph II

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = lifeExp)+ 
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "Log GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Linear Model

Log-Linear ModelLog-linear model has the dependent variable (Y) logged
   

Log-Linear Model

Log-linear model has the dependent variable logged

Log-Linear Model

Log-linear model has the dependent variable logged

Marginal effect of : a 1 unit change in a % change in

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

gapminder <- gapminder %>%
  mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s
         loglife = log(lifeExp)) # take the log of LifeExp
gapminder %>% head() # look at it

ABCDEFGHIJ0123456789

country <fctr>	continent <fctr>	year <int>	lifeExp <dbl>	pop <int>	gdpPercap <dbl>	loggdp <dbl>	gdp_t <dbl>	loglife <dbl>
Afghanistan	Asia	1952	28.801	8425333	779.4453	6.658583	0.7794453	3.360410
Afghanistan	Asia	1957	30.332	9240934	820.8530	6.710344	0.8208530	3.412203
Afghanistan	Asia	1962	31.997	10267083	853.1007	6.748878	0.8531007	3.465642
Afghanistan	Asia	1967	34.020	11537966	836.1971	6.728864	0.8361971	3.526949
Afghanistan	Asia	1972	36.088	13079460	739.9811	6.606625	0.7399811	3.585960
Afghanistan	Asia	1977	38.438	14880372	786.1134	6.667101	0.7861134	3.649047

Log-Linear Model in R

log_lin_reg<-lm(loglife~gdp_t, data = gapminder)
tidy(log_lin_reg)

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	3.966639	0.0058345501	679.85339
gdp_t	0.012917	0.0004777072	27.03958

Log-Linear Model in R

log_lin_reg <- lm(loglife ~ gdp_t, data = gapminder)
log_lin_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	3.966639	0.0058345501	679.85339
gdp_t	0.012917	0.0004777072	27.03958

A $1 (thousand) change in GDP a 1.3% increase in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife ~ gdp_t, data = gapminder)
log_lin_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	3.966639	0.0058345501	679.85339
gdp_t	0.012917	0.0004777072	27.03958

A $1 (thousand) change in GDP a 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP a 32.5% decrease in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife ~ gdp_t, data = gapminder)
log_lin_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>
(Intercept)	3.966639	0.0058345501	679.85339
gdp_t	0.012917	0.0004777072	27.03958

A $1 (thousand) change in GDP a 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP a 32.5% decrease in Life Expectancy
A $100 (thousand) rise in GDP a 130% increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdp_t,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120,20))+
  labs(x = "GDP per Capita ($ Thousands)",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Log Model

Log-Log ModelLog-log model has both variables (X and Y) logged
   

Log-Log Model

Log-log model has both variables logged

Log-Log Model

Log-log model has both variables logged

Marginal effect of : a 1% change in a % change in
is the elasticity of with respect to !

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp, data = gapminder)
log_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	2.864177	0.02328274	123.01718	0
loggdp	0.146549	0.00282126	51.94452	0

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp, data = gapminder)
log_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	2.864177	0.02328274	123.01718	0
loggdp	0.146549	0.00282126	51.94452	0

A 1% change in GDP a 0.147% increase in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp, data = gapminder)
log_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	2.864177	0.02328274	123.01718	0
loggdp	0.146549	0.00282126	51.94452	0

A 1% change in GDP a 0.147% increase in Life Expectancy
A 25% fall in GDP a 3.675% decrease in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp, data = gapminder)
log_log_reg %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	2.864177	0.02328274	123.01718	0
loggdp	0.146549	0.00282126	51.94452	0

A 1% change in GDP a 0.147% increase in Life Expectancy
A 25% fall in GDP a 3.675% decrease in Life Expectancy
A 100% rise in GDP a 14.7% increase in Life Expectancy

Log-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  labs(x = "Log GDP per Capita",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Comparing Models I

Model
Equation
Interpretation


Linear-Log
Y=β0+β1lnX
1% change in X→^β1100 unit change in Y

Log-Linear
lnY=β0+β1X
1 unit change in X→^β1×100% change in Y

Log-Log
lnY=β0+β1lnX
1% change in X→^β1% change in Y

Hint: the variable that gets logged changes in percent terms, the variable not logged changes in unit terms
   

Model	Equation	Interpretation
Linear-Log		1% change in unit change in
Log-Linear		1 unit change in % change in
Log-Log		1% change in % change in

Comparing Models IIlibrary(huxtable)
huxreg("Life Exp." = lin_log_reg,
       "Log Life Exp." = log_lin_reg,
       "Log Life Exp." = log_log_reg,
       coefs = c("Constant" = "(Intercept)",
                 "GDP ($1000s)" = "gdp_t",
                 "Log GDP" = "loggdp"),
       statistics = c("N" = "nobs",
                      "R-Squared" = "r.squared",
                      "SER" = "sigma"),
       number_format = 2)

Models are very different units, how to choose? Compare R2’s
Compare graphs
Compare intution



Life Exp.Log Life Exp.Log Life Exp.

Constant-9.10 ***3.97 ***2.86 ***

(1.23)   (0.01)   (0.02)   

GDP ($1000s)       0.01 ***       

       (0.00)          

Log GDP8.41 ***       0.15 ***

(0.15)          (0.00)   

N1704       1704       1704       

R-Squared0.65    0.30    0.61    

SER7.62    0.19    0.14    

 *** p < 0.001;  ** p < 0.01;  * p < 0.05.


   

	Life Exp.	Log Life Exp.	Log Life Exp.
Constant	-9.10 ***	3.97 ***	2.86 ***
	(1.23)	(0.01)	(0.02)
GDP ($1000s)		0.01 ***
		(0.00)
Log GDP	8.41 ***		0.15 ***
	(0.15)		(0.00)
N	1704	1704	1704
R-Squared	0.65	0.30	0.61
SER	7.62	0.19	0.14
* p < 0.001; p < 0.01; * p < 0.05.

Comparing Models III

Linear-Log
Log-Linear
Log-Log

^Yi=^β0+^β1lnXi
lnYi=^β0+^β1Xi
lnYi=^β0+^β1lnXi

R2=0.65
R2=0.30
R2=0.61

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots


   

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots


Avoid logs for:Variables that are less than one, decimals, 0, or negative
Categorical variables (season, gender, political party)
Time variables (year, week, day)


   

Comparing Across Units

Comparing Coefficients of Different Units I

We often want to compare coefficients to see which variable or has a bigger effect on
What if and are different units?

Example:

Comparing Coefficients of Different Units II

An easy way is to standardize^† the variables (i.e. take the -score)

^† Also called “centering” or “scaling.”

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

Marginal effects on (in standard deviations of ) from 1 standard deviation change in :
: a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations

: a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations

Standardizing in `R`

Use the scale() command inside mutate() function to standardize a variable

gapminder<-gapminder %>%
  mutate(std_life = scale(lifeExp),
         std_gdp = scale(gdpPercap)) 
std_reg<-lm(std_life~std_gdp, data = gapminder)
tidy(std_reg)

term	estimate	std.error	statistic	p.value
(Intercept)	1.1e-16	0.0197	5.57e-15	1
std_gdp	0.584	0.0197	29.7	3.57e-156

Joint Hypothesis Testing

Joint Hypothesis Testing I

Example: Return again to:

Joint Hypothesis Testing I

Example: Return again to:

Maybe region doesn't affect wages at all?

Joint Hypothesis Testing I

Example: Return again to:

Maybe region doesn't affect wages at all?

Joint Hypothesis Testing I

Example: Return again to:

Maybe region doesn't affect wages at all?
This is a joint hypothesis to test

Joint Hypothesis Testing IIA joint hypothesis tests against the null hypothesis of a value for multiple parameters:
H0:β1=β2=0

the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome) 
   

Joint Hypothesis Testing II

A joint hypothesis tests against the null hypothesis of a value for multiple parameters:
the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome)
Our alternative hypothesis is that:
or simply, that is not true

Types of Joint Hypothesis TestsThree main cases of joint hypothesis tests:

   

Types of Joint Hypothesis Tests

Three main cases of joint hypothesis tests:

1) :

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
: at least one parameter 0

Types of Joint Hypothesis Tests

Three main cases of joint hypothesis tests:

1) :

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
: at least one parameter 0

2) :

Testing whether two variables matter the same
Variables must be the same units

Types of Joint Hypothesis Tests

Three main cases of joint hypothesis tests:

1) :

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
: at least one parameter 0

2) :

Testing whether two variables matter the same
Variables must be the same units

3) ALL 's

The "Overall F-test"
Testing against claim that regression model explains NO variation in

Joint Hypothesis Tests: F-statisticThe F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
   

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
is an analysis of variance (ANOVA)
- essentially tests whether increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
is an analysis of variance (ANOVA)
- essentially tests whether increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model
has its own distribution, with two sets of degrees of freedom

Joint Hypothesis F-test: Example I

Example: Return again to:

Joint Hypothesis F-test: Example I

Example: Return again to:

Joint Hypothesis F-test: Example I

Example: Return again to:

: is not true (at least one )

Joint Hypothesis F-test: Example II

Example: Return again to:

Unrestricted model:

Joint Hypothesis F-test: Example II

Example: Return again to:

Unrestricted model:

Restricted model:

Joint Hypothesis F-test: Example II

Example: Return again to:

Unrestricted model:

Restricted model:

: does going from restricted to unrestricted model statistically significantly improve ?

Calculating the F-statistic

: the from the unrestricted model (all variables)

Calculating the F-statistic

: the from the unrestricted model (all variables)
: the from the restricted model (null hypothesis)

Calculating the F-statistic

: the from the unrestricted model (all variables)
: the from the restricted model (null hypothesis)
: number of restrictions (number of under null hypothesis)

Calculating the F-statistic

: the from the unrestricted model (all variables)
: the from the restricted model (null hypothesis)
: number of restrictions (number of under null hypothesis)
: number of variables in unrestricted model (all variables)

Calculating the F-statistic

: the from the unrestricted model (all variables)
: the from the restricted model (null hypothesis)
: number of restrictions (number of under null hypothesis)
: number of variables in unrestricted model (all variables)

F has two sets of degrees of freedom:
- for the numerator, for the denominator

Calculating the F-statistic II

Key takeaway: The bigger the difference between , the greater the improvement in fit by adding variables, the larger the !
This formula is (believe it or not) actually a simplified version (assuming homoskedasticity)
- I give you this formula to build your intuition of what F is measuring

F-test Example I

We'll use the wooldridge package's wage1 data again

# load in data from wooldridge package
library(wooldridge)
wages<-wooldridge::wage1
# run regressions
unrestricted_reg<-lm(wage~female+northcen+west+south, data=wages)
restricted_reg<-lm(wage~female, data=wages)

F-test Example II

Unrestricted model:

Restricted model:

restrictions (F numerator df)
(F denominator df)

F-test Example III

We can use the car package's linearHypothesis() command to run an F-test:
- first argument: name of the (unrestricted) regression
- second argument: vector of variable names (in quotes) you are testing

# load car package for additional regression tools
library("car") 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south"))

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
524	6.33e+03
521	6.17e+03	3	157	4.43	0.00438

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

Does height as an adolescent have the same effect on wages as height as an adult?

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

Does height as an adolescent have the same effect on wages as height as an adult?

What is the restricted regression?

restriction

Second F-test Example: Data

# load in data
heightwages<-read_csv("../data/heightwages.csv")
# make a "heights" variable as the sum of adolescent (height81) and adult (height85) height
heightwages <- heightwages %>%
  mutate(heights=height81+height85)
height_reg<-lm(wage96~height81+height85+male, data=heightwages)
height_restricted_reg<-lm(wage96~heights+male, data=heightwages)

Second F-test Example: Data

For second argument, set two variables equal, in quotes

linearHypothesis(height_reg, "height81=height85") # F-test

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
6.59e+03	5.13e+06
6.59e+03	5.13e+06	1	959	1.23	0.267

Insufficient evidence to reject !
The effect of adolescent and adult height on wages is the same

All F-test Isummary(unrestricted_reg)

## 
## Call:
## lm(formula = wage ~ female + northcen + west + south, data = wages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3269 -2.0105 -0.7871  1.1898 17.4146 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.5654     0.3466  21.827   <2e-16 ***
## female       -2.5652     0.3011  -8.520   <2e-16 ***
## northcen     -0.5918     0.4362  -1.357   0.1755    
## west          0.4315     0.4838   0.892   0.3729    
## south        -1.0262     0.4048  -2.535   0.0115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.443 on 521 degrees of freedom
## Multiple R-squared:  0.1376,    Adjusted R-squared:  0.131 
## F-statistic: 20.79 on 4 and 521 DF,  p-value: 6.501e-16
Last line of regression output from summary() is an All F-testH0: all β′s=0 
the regression explains no variation in Y
Calculates an F-statistic that, if high enough, is significant (p-value <0.05) enough to reject H0


   

All F-test IIAlternatively, if you use broom instead of summary():glance() command makes table of regression summary statistics
tidy() only shows coefficients

library(broom)
glance(unrestricted_reg)

r.squaredadj.r.squaredsigmastatisticp.valuedflogLikAICBICdeviancedf.residualnobs

0.1380.1313.4420.86.5e-164-1.39e+032.8e+032.83e+036.17e+03521526

"statistic" is the All F-test, "p.value" next to it is the p value from the F test

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

3.9 — Logarithmic Regression

ECON 480 • Econometrics • Fall 2020

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF20 metricsF20.classes.ryansafner.com

Outline

Nonlinearities

Nonlinearities

Nonlinearities

Nonlinearities

Natural Logarithms

Logarithmic Models

The Natural Logarithm

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review II

The Natural Logarithm: Properties

The Natural Logarithm: Example

The Natural Logarithm: Example

The Natural Logarithm: Example

Elasticity

Elasticity

Elasticity

Elasticity

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Logarithms in R I

Logarithms in R II

Logarithms in R III

Types of Logarithmic Models

Types of Logarithmic Models

Types of Logarithmic Models

Types of Logarithmic Models

Linear-Log Model

Linear-Log Model

Linear-Log Model

Linear-Log Model

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model Graph I

Linear-Log Model Graph II

Log-Linear Model

Log-Linear Model

Log-Linear Model

Log-Linear Model

Log-Linear Model in R (Preliminaries)

Log-Linear Model in R (Preliminaries)

Log-Linear Model in R

Log-Linear Model in R

Log-Linear Model in R

Log-Linear Model in R

Linear-Log Model Graph I

Log-Log Model

Log-Log Model

Log-Log Model

Log-Log Model

Log-Log Model in R

Log-Log Model in R

Log-Log Model in R

Log-Log Model in R

Log-Log Model Graph I

Comparing Models I

Comparing Models II

Comparing Models III

When to Log?

When to Log?

Comparing Across Units

Comparing Coefficients of Different Units I

Comparing Coefficients of Different Units II

Comparing Coefficients of Different Units: Example

Comparing Coefficients of Different Units: Example

Standardizing in R

Joint Hypothesis Testing

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF20
metricsF20.classes.ryansafner.com

Standardizing in `R`