class: center, middle, inverse, title-slide # 3.2 — Causal Inference and DAGs ## ECON 480 • Econometrics • Fall 2020 ### Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF20
metricsF20.classes.ryansafner.com
--- class: inverse # Outline ### [Correlation vs. Causation](#3) ### [Causal Diagrams](#3) ### [DAG Rules](#16) --- # You Don’t Need an RCT to Talk About Causality .pull-left[ - Statistics profession is obstinant that we cannot say anything about causality - But you have to! It's how the human brain works! - We can’t concieve of (spurious) correlation without some causation ] .pull-right[ .center[ ![:scale 70%](../images/causation.jpg) ] ] --- # The Causal Revolution .center[ ![:scale 80%](../images/causaltwitter.png) ] --- # RCTs and Evidence-Based Policy - .hi-purple[Should we *ONLY* base policies on the evidence from Randomized Controlled Trials?] -- .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">| ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄|<br> IF U DONT SMOKE,<br> U ALREADY <br> BELIEVE IN<br> CAUSAL INFERENCE<br> WITHOUT<br> RANDOMIZED TRIALS<br>|__________| <br> (\__/) ||<br> (•ㅅ•) ||<br> / づ<a href="https://twitter.com/hashtag/HistorianSignBunny?src=hash&ref_src=twsrc%5Etfw">#HistorianSignBunny</a> <a href="https://twitter.com/hashtag/Epidemiology?src=hash&ref_src=twsrc%5Etfw">#Epidemiology</a></p>— Ellie Murray (@EpiEllie) <a href="https://twitter.com/EpiEllie/status/1017622949799571456?ref_src=twsrc%5Etfw">July 13, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] -- .pull-right[ .center[ Source: [British Medical Journal](https://www.bmj.com/content/363/bmj.k5094) ![](https://www.dropbox.com/s/9mimwcoamiv54tf/rctparachutesstudy.png?raw=1) ] ] --- # RCTs and Evidence-Based Policy III .pull-left[ .center[ ![:scale 80%](https://www.dropbox.com/s/5ptmdhgeyerhr4a/rctparachutes1.jpg?raw=1) ] ] -- .pull-right[ .center[ ![:scale 80%](https://www.dropbox.com/s/3knd5wy8h4eyq1j/rctparachutes2.jpg?raw=1) ] ] --- class: inverse, center, middle # Correlation vs. Causation --- # Correlation and Causation I .center[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">"Correlation implies casuation," the dean hi-purpleed as he handed me my PhD.<br><br>"But then why-"<br><br>"Because if they knew, they wouldn't need us."</p>— David Robinson (@drob) <a href="https://twitter.com/drob/status/877976063942512640?ref_src=twsrc%5Etfw">June 22, 2017</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- # What Does Causation Mean? .pull-left[ .smallest[ - “Correlation does not imply causation” - this is exactly backwards! - this is just pointing out that exogeneity is violated ] ] .pull-right[ .center[ ![](../images/causation.jpg) ] ] --- # What Does Causation Mean? .pull-left[ .smallest[ - “Correlation does not imply causation” - this is exactly backwards! - this is just pointing out that exogeneity is violated - “Correlation implies causation” - for an association, there must be *some* causal chain of variables that relate `\(X\)` and `\(Y\)` - but not necessarily merely `\(X \rightarrow Y\)` ] ] .pull-right[ .center[ ![](../images/causation.jpg) ] ] --- # What Does Causation Mean? .pull-left[ .smallest[ - “Correlation does not imply causation” - this is exactly backwards! - this is just pointing out that exogeneity is violated - “Correlation implies causation” - for an association, there must be *some* causal chain of variables that relate `\(X\)` and `\(Y\)` - but not necessarily merely `\(X \rightarrow Y\)` - “Correlation plus exogeneity is causation.” ] ] .pull-right[ .center[ ![](../images/causation.jpg) ] ] --- # Correlation and Causation .pull-left[ - .hi-purple[Correlation:] - Math & Statistics - Computers, AI, Machine learning can figure this out (even better than humans) - .hi-purple[Causation:] - Philosophy, Intuition, Theory - .hi[Counterfactual thinking], unique to humans (vs. animals or machines) - Computers *cannot* yet figure this out ] .pull-right[ .center[ ![](https://www.dropbox.com/s/c91a06o91rf3e5h/causation.jpg?raw=1) ] ] --- # The Causal Revolution .pull-left[ .center[ ![](https://www.dropbox.com/s/qp9m156rcxqp3nc/bookofwhy.jpg?raw=1) ] ] .pull-right[ .center[ ![](../images/judea-pearl.jpg) ] ] --- # Causation Requires Counterfactual Thinking .pull-left[ .center[ ![:scale 70%](https://www.dropbox.com/s/6n3dg3xpizsxpsj/ladderofcausation.png?raw=1) ] ] .pull-right[ .center[ ![:scale 70%](https://www.dropbox.com/s/qp9m156rcxqp3nc/bookofwhy.jpg?raw=1) ] ] --- background-image: url(https://www.dropbox.com/s/zsx3pa4m51p82dj/twopaths.jpg?raw=1) background-size: cover --- # Causal Inference .pull-left[ - We will seek to understand what causality *is* and how we can approach finding it - We will also explore the different common .hi[research designs] meant to .hi-purple[identify] causal relationships - **These skills**, more than supply & demand, constrained optimization models, ISLM, etc, **are the tools and comparative advantage of a modern research economist** - Why all big companies (especially in tech) have entire economics departments in them ] .pull-right[ .center[ ![](https://www.dropbox.com/s/yw8t5xsa8dgei71/coding.jpeg?raw=1) ] ] --- # Clever Research Designs Identify Causality <img src="3.2-slides_files/figure-html/unnamed-chunk-1-1.png" width="60%" style="display: block; margin: auto;" /> --- # Correlation and Causation .center[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Causality isn't achieved; it's approached.</p>— John B. Holbein (@JohnHolbein1) <a href="https://twitter.com/JohnHolbein1/status/982635508089180161?ref_src=twsrc%5Etfw">April 7, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- # What Then IS Causation? .pull-left[ .smallest[ - `\(X\)` causes `\(Y\)` if we can intervene and change `\(X\)` without changing anything else, and `\(Y\)` changes - `\(Y\)` “listens to” `\(X\)` - `\(X\)` may not be the only thing that causes `\(Y\)`! ] ] .pull-right[ .center[ ![:scale 80%](https://www.dropbox.com/s/6w66jcc9gmw3hw4/lightswitch.jpg?raw=1) ] ] --- # What Then IS Causation? .pull-left[ .smallest[ - `\(X\)` causes `\(Y\)` if we can intervene and change `\(X\)` without changing anything else, and `\(Y\)` changes - `\(Y\)` “listens to” `\(X\)` - `\(X\)` may not be the only thing that causes `\(Y\)`! .content-box-green[ .green[**Example**] If `\(X\)` is a light switch, and `\(Y\)` is a light: - Flipping the switch `\((X)\)` causes the light to go on `\((Y)\)` - But NOT if the light is burnt out (No `\(Y\)` despite `\(X\)`) - OR if the light was already on `\((Y\)` without `\(X\)`) ] ] ] .pull-right[ .center[ ![:scale 80%](https://www.dropbox.com/s/6w66jcc9gmw3hw4/lightswitch.jpg?raw=1) ] ] --- # Non-Causal Claims - All of the following have non-zero correlations. Are they *causal*? .content-box-green[ .green[**Example**] - Greater ice cream sales `\(\rightarrow\)` more violent crime - Rooster crows `\(\rightarrow\)` the sun rises in the morning - Taking Vitamin C `\(\rightarrow\)` Colds go away a few days later - Political party in power `\(\rightarrow\)` economy performs better or worse ] --- # Counterfactuals .pull-left[ - The *sine qua non* of causal claims are .hi[counterfactuals]: what would `\(Y\)` have been if `\(X\)` had been different? - It is **impossible** to make a counterfactual claim from data alone! - Need a (theoretical) .hi-purpl[causal model] of the data-generating process! ] .pull-right[ .center[ ![](https://www.dropbox.com/s/zsx3pa4m51p82dj/twopaths.jpg?raw=1) ] ] --- # Counterfactuals and RCTs .pull-left[ .smallest[ - Again, RCTs are invoked as the gold standard for their ability to make counterfactual claims: - Treatment/intervention `\((X)\)` is *randomly assigned* to individuals, and then outcome `\(Y\)` is measured > If person i who recieved treatment *had not recieved* the treatment, we can predict what his outcome *would have been* > If person j who did not recieve treatment *had recieved treatment*, we can predict what her outcome *would have been* - We can say this because, on average, treatment and control groups are *the same before treatment* ] ] .pull-right[ .center[ ![](https://www.dropbox.com/s/zsx3pa4m51p82dj/twopaths.jpg?raw=1) ] ] --- # From RCTs to Causal Models .pull-left[ - RCTs are but the best-known method of a large, growing science of .hi[causal inference] - We need a .hi[causal model] to describe the .hi[data-generating process (DGP)] - Requires us to make some .hi-purple[assumptions] ] .pull-right[ .center[ ![](https://www.dropbox.com/s/zsx3pa4m51p82dj/twopaths.jpg?raw=1) ] ] --- class: inverse, center, middle # Causal Diagrams --- # Causal Diagrams/DAGs .pull-left[ - A surprisingly simple, yet rigorous and powerful method of modeling is using a .hi[causal diagram] or .hi[DAG]: - .hi[Directed]: Each node has arrows that points only one direction - .hi[Acyclic]: Arrows only have one direction, and cannot loop back - .hi[Graph] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] --- # Causal Diagrams/DAGs .pull-left[ - A visual model of the data-generating process, encodes our understanding of the causal relationships - Requires some common sense/economic intutition - Remember, all models are wrong, we just need them to be *useful*! ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- # Causal Diagrams/DAGs .pull-left[ - Our light switch example of causality .center[ ![:scale 60%](https://www.dropbox.com/s/6w66jcc9gmw3hw4/lightswitch.jpg?raw=1) ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- # Drawing a DAG: Example .pull-left[ .smallest[ - Suppose we have data on three variables - `IP`: how much a firm spends on IP lawsuits - `tech`: whether a firm is in tech industry - `profit`: firm profits - They are all correlated with each other, but what's are the causal relationships? - We need our own .hi-purple[causal model] (from theory, intuition, etc) to sort - Data alone will not tell us! ] ] .pull-right[ <img src="3.2-slides_files/figure-html/unnamed-chunk-5-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Drawing a DAG: .pull-left[ .smallest[ 1. Consider all the variables likely to be important to the data-generating process (including variables we can't observe!) 2. For simplicity, combine some similar ones together or prune those that aren't very important 3. Consider which variables are likely to affect others, and draw arrows connecting them 4. Test some testable implications of the model (to see if we have a correct one!) ] ] .pull-right[ .center[ ![](https://www.dropbox.com/s/v5vwsadw5vs448t/causality.jpg?raw=1) ] ] --- # Side Notes .pull-left[ .smallest[ - Drawing an arrow requires a direction - making a statement about causality! - *Omitting* an arrow makes an equally important statement too! - In fact, we will *need* omitted arrows to show causality! - If two variables are correlated, but neither causes the other, likely they are both caused by another (perhaps **unobserved**) variable - add it! - There should be no *cycles* or *loops* (if so, there’s probably another missing variable, such as time) ] ] .pull-right[ .center[ ![](https://www.dropbox.com/s/v5vwsadw5vs448t/causality.jpg?raw=1) ] ] --- # DAG Example I .pull-left[ .content-box-green[ .green[**Example**]: what is the effect of education on wages? ] - Education `\((X\)`, “treatment” or “exposure”) - Wages `\((Y\)`, “outcome” or “response”) ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- # DAG Example I .pull-left[ - What other variables are important? - Ability - Socioeconomic status - Demographics - Phys. Ed. requirements - Year of birth - Location - Schooling laws - Job connections ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- # DAG Example I .pull-left[ .smallest[ - In social science and complex systems, 1000s of variables could plausibly be in DAG! - So simplify: - Ignore trivial things (Phys. Ed. requirement) - Combine similar variables (Socioeconomic status, Demographics, Location) `\(\rightarrow\)` Background ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] --- # DAG Example II .pull-left[ - Background, Year of birth, Location, Compulsory schooling, all cause education - Background, year of birth, location, job connections probably cause wages ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] --- # DAG Example III .pull-left[ - Background, Year of birth, Location, Compulsory schooling, all cause education - Background, year of birth, location, job connections probably cause wages - Job connections in fact is probably caused by education! - Location and background probably both caused by unobserved factor (`u1`) ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- # DAG Example IV .pull-left[ - This is messy, but we have a causal model! - Makes our assumptions **explicit**, and many of them are **testable** - DAG suggests certain relationships that will *not* exist: - all relationships between `laws` and `conx` go through `educ` - so if we controlled for `educ`, then `cor(laws,conx)` should be zero! ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] --- # Let the Computer Do It: Dagitty.net I .pull-left[ .center[ ![](https://www.dropbox.com/s/zhqgfyvk2x4z863/dagitty.png?raw=1) ] ] .pull-right[ - [Dagitty.net](http://dagitty.net) is a great tool to make these and give you testable implications - Click `Model -> New Model` - Name your "exposure" variable `\((X\)` of interest) and "outcome" variable `\((Y)\)` ] --- # Let the Computer Do It: Dagitty.net II .pull-left[ .center[ ![](https://www.dropbox.com/s/qu6839emzoitb1c/dagittyex1.png?raw=1) ] ] .pull-right[ - Click and drag to move nodes around - Add a new variable by double-clicking - Add an arrow by double-clicking one variable and then double-clicking on the target (do again to remove arrow) ] --- # Let the Computer Do It: Dagitty.net III .pull-left[ .center[ ![](https://www.dropbox.com/s/a55b66p8i4znjto/dagittyex2.png?raw=1) ] ] .pull-right[ .smallest[ - Tells you .hi-purple[how to identify your effect]! (upper right) > .hi-purple[Minimal sufficient adjustment sets] containing background, location, year for estimating the total effect of educ on wage: background, location, year - Tells you some .hi-turquoise[testable implications] (middle right) - .hi-turquoise[conditional independencies], for example (last): - `job_connections` `\(\perp\)` `year` | `educ` - means: holding constant `educ`, there should be no correlation between `job_connections` and `year` - can test this with data! ] ] --- # Causal Effect .pull-left[ .center[ ![](https://www.dropbox.com/s/o7h0zbpvej0zuf5/dagittycontrolled.png?raw=1) ] ] .pull-right[ - If we control for `background`, `location`, and `year`, we can .hi-purple[identify the causal effect] of `educ` `\(\rightarrow\)` `wage`. ] --- # You Can Draw DAGs In R .pull-left[ - New package: `ggdag` - Arrows are made with formula notation: `Y~X+Z` means "`Y` is caused by `X` and `Z`" ```r # install.packages("ggdag") library(ggdag) dagify(wage~educ+conx+year+bckg+loc, educ~bckg+year+loc+laws, conx~educ, bckg~u1, loc~u1, exposure = "educ", # optional: define X outcome = "wage" # optional: define Y ) %>% ggdag()+ theme_dag() ``` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] --- # ggdag: Additional Tools .pull-left[ - If you have defined `X` (`exposure`) and `Y` (`outcome`), you can use `ggdag_paths()` to have it show all possible paths between `\(X\)` and `\(Y\)`! ```r dagify(wage~educ+conx+year+bckg+loc, educ~bckg+year+loc+laws, conx~educ, bckg~u1, loc~u1, exposure = "educ", outcome = "wage" ) %>% tidy_dagitty(seed = 2) %>% ggdag_paths()+ theme_dag() ``` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-15-1.png)<!-- --> ] --- # You Can Draw DAGs In R .pull-left[ - If you have defined `X` (`exposure`) and `Y` (`outcome`), you can use `ggdag_adjustment_set()` to have it show you what you need to control for in order to identify `\(X \rightarrow Y\)`! ```r dagify(wage~educ+conx+year+bckg+loc, educ~bckg+year+loc+laws, conx~educ, bckg~u1, loc~u1, exposure = "educ", outcome = "wage" ) %>% ggdag_adjustment_set(shadow = T)+ # shadow shows adjusted arrows theme_dag() ``` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] --- class: inverse, center, middle # DAG Rules --- # DAG Rules .pull-left[ .smaller[ - How does dagitty.net and `ggdag` know how to identify effects, or what to control for, or what implications are testable? - Comes from fancy math called “do-calculus” .center[ ![](../images/do-calculus.png) ] - Fortunately, these amount to a few simple rules that we can see on a DAG ] ] .pull-right[ .center[ ![](https://www.dropbox.com/s/qp9m156rcxqp3nc/bookofwhy.jpg?raw=1) ] ] --- # DAGs I .pull-left[ .smaller[ - Typical notation: - `\(X\)` is independent variable of interest - Epidemiology: .hi-purple["intervention"] or .hi-purple[“exposure”] - `\(Y\)` is dependent or .hi-purple["response"] variable - Other variables use other letters - You can of course use words instead of letters! ] ] .pull-right[ <img src="3.2-slides_files/figure-html/unnamed-chunk-18-1.png" width="504" style="display: block; margin: auto;" /> ] --- # DAGs and Causal Effects .pull-left[ - Arrows indicate causal effect (& direction) - Two types of causal effect: 1. Direct effects: `\(X \rightarrow Y\)` ] .pull-right[ <img src="3.2-slides_files/figure-html/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto;" /> ] --- # DAGs and Causal Effects .pull-left[ - Arrows indicate causal effect (& direction) - Two types of causal effect: 1. Direct effects: `\(X \rightarrow Y\)` 2. Indirect effects: `\(X \rightarrow M \rightarrow Y\)` - `\(M\)` is a .hi-purple[“mediator”] variable, the .hi-purple[mechanism] by which `\(X\)` affects `\(Y\)` ] .pull-right[ <img src="3.2-slides_files/figure-html/unnamed-chunk-20-1.png" width="504" style="display: block; margin: auto;" /> ] --- # DAGs and Causal Effects .pull-left[ - Arrows indicate causal effect (& direction) - Two types of causal effect: 1. Direct effects: `\(X \rightarrow Y\)` 2. Indirect effects: `\(X \rightarrow M \rightarrow Y\)` - `\(M\)` is a .hi-purple[“mediator”] variable, the .hi-purple[mechanism] by which `\(X\)` affects `\(Y\)` 3. You of course might have both! ] .pull-right[ <img src="3.2-slides_files/figure-html/unnamed-chunk-21-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Confounders .pull-left[ .smaller[ - `\(Z\)` is a .hi[“confounder”] of `\(X \rightarrow Y\)`, it causes *both* `\(X\)` and `\(Y\)` - `\(cor(X,Y)\)` is made up of two parts: 1. Causal effect of `\((X \rightarrow Y)\)` 👍 2. `\(Z\)` causing both the values of `\(X\)` and `\(Y\)` 👎 - Failing to control for `\(Z\)` will .hi-purple[bias] our estimate of the causal effect of `\(X \rightarrow Y\)`! ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-22-1.png)<!-- --> ] --- # Confounders .pull-left[ - Confounders are the DAG-equivalent of .hi[omitted variable bias] (next class) `$$Y_i=\beta_0+\beta_1 X_i$$` - By leaving out `\(Z_i\)`, this regression is .hi-purple[biased] - `\(\hat{\beta}_1\)` picks up *both*: - `\(X \rightarrow Y\)` - `\(X \leftarrow Z \rightarrow Y\)` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-23-1.png)<!-- --> ] --- # “Front Doors” and “Back Doors” .pull-left[ .smallest[ - With this DAG, there are 2 paths that connect `\(X\)` and `\(Y\)`<sup>.magenta[†]</sup>: 1. A .hi[causal “front-door” path]: `\(X \rightarrow Y\)` - 👍 what we want to measure 2. A .hi[non-causal “back-door” path]: `\(X \leftarrow Z \rightarrow Y\)` - At least one causal arrow runs in the opposite direction - 👎 adds a confounding bias ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] .footnote[<sup>.magenta[†]</sup> Regardless of the *directions* of the arrows!] --- # Controlling I .pull-left[ .smaller[ - Ideally, if we ran a .hi[randomized control trial] and randomily assigned different values of `\(X\)` to different individuals, this would delete the arrow between `\(Z\)` and `\(X\)` - Individuals’ values of `\(Z\)` do not affect whether or not they are treated ($X$) - This would only leave the front-door, `\(X \rightarrow Y\)` - But we can rarely run an ideal RCT ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ] --- # Controlling I .pull-left[ - Instead of an RCT, if we can just .hi-purple[“adjust for”] or .hi-purple[“control for”] `\(Z\)`, we can *block* the back-door path `\(X \leftarrow Z \rightarrow Y\)` - This would only leave the front-door path open, `\(X \rightarrow Y\)` - “As good as” an RCT! ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-26-1.png)<!-- --> ] --- # Controlling I .pull-left[ - Using our terminology from last class, we have an outcome `\((Y)\)`, and some treatment - But there are .b[unobserved factors] `\((u)\)` `$$Y_i = \beta_0 + \beta_1 Treatment + u_i$$` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-27-1.png)<!-- --> ] --- # Controlling I .pull-left[ - Using our terminology from last class, we have an outcome `\((Y)\)`, and some treatment - But there are .b[unobserved factors] `\((u)\)` `$$Y_i = \beta_0 + \beta_1 Treatment + u_i$$` - If we can *randomly* assign treatment, this makes treatment exogenous: `$$cor(treatment,u)=0$$` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-28-1.png)<!-- --> ] --- # Controlling I .pull-left[ - Using our terminology from last class, we have an outcome `\((Y)\)`, and some treatment - But there are other .b[unobserved factors] `\((u)\)` `$$Y_i = \beta_0 + \beta_1 Treatment + u_i$$` - When we (often) can’t randomly assign treatment, we have to find another way to control for measurable things in `\(u\)` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-29-1.png)<!-- --> ] --- # Controlling II .pull-left[ - Controlling for a single variable along a long causal path is sufficient to block that path! - Causal path: `\(X \rightarrow Y\)` - Backdoor path: `\(X \leftarrow A \rightarrow B \rightarrow C \rightarrow Y\)` - It is sufficient to block this backdoor by controlling **either** `\(A\)` **or** `\(B\)` **or** `\(C\)`! ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- # Controlling II .pull-left[ - Controlling for a single variable along a long causal path is sufficient to block that path! - Causal path: `\(X \rightarrow Y\)` - Backdoor path: `\(X \leftarrow A \rightarrow B \rightarrow C \rightarrow Y\)` - It is sufficient to block this backdoor by controlling **either** `\(A\)` **or** `\(B\)` **or** `\(C\)`! ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-31-1.png)<!-- --> ] --- # The Back Door Criterion .pull-left[ - To .hi[identify] the causal effect of `\(X \rightarrow Y\)`: - .hi-purple[“Back-door criterion”]: control for the minimal amount of variables sufficient to ensure that .b[no open back-door exists] between `\(X\)` and `\(Y\)` - .hi-green[Example]: in this DAG, control for `\(Z\)` ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-32-1.png)<!-- --> ] --- # The Back Door Criterion .pull-left[ .quitesmall[ - Implications of the Back-door criterion: 1) You *only* need to control for the variables that keep a back-door open, *not all other variables!* .content-box-green[ .green[**Example**]: - `\(X \rightarrow Y\)` (front-door) - `\(X \leftarrow A \rightarrow B \rightarrow Y\)` (back-door) ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-33-1.png)<!-- --> ] --- # The Back Door Criterion .pull-left[ .quitesmall[ - Implications of the Back-door criterion: 1) You *only* need to control for the variables that keep a back-door open, *not all other variables!* .content-box-green[ .green[**Example**]: - `\(X \rightarrow Y\)` (front-door) - `\(X \leftarrow A \rightarrow B \rightarrow Y\)` (back-door) - Need only control for `\(A\)` *or* `\(B\)` to block the back-door path - `\(C\)` and `\(Z\)` have no effect on `\(X\)`, and therefore we don’t need to control for them! ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-34-1.png)<!-- --> ] --- # The Back Door Criterion: Colliders .pull-left[ .smallest[ 2) Exception: the case of a .hi[“collider”] - If arrows “collide” at a node, **that node is automatically blocking the pathway**, .hi-purple[do not control for it!] - Controlling for a collider would *open* the path and .b[add bias!] .content-box-green[ .green[**Example**]: - `\(X \rightarrow Y\)` (front-door) - `\(X \leftarrow A \rightarrow B \leftarrow C \rightarrow Y\)` (back-door, but **blocked by B!**) ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-35-1.png)<!-- --> ] --- # The Back Door Criterion: Colliders .pull-left[ .smallest[ 2) Exception: the case of a .hi[“collider”] - If arrows “collide” at a node, **that node is automatically blocking the pathway**, .hi-purple[do not control for it!] - Controlling for a collider would *open* the path and .b[add bias!] .content-box-green[ .green[**Example**]: - `\(X \rightarrow Y\)` (front-door) - `\(X \leftarrow A \rightarrow B \leftarrow C \rightarrow Y\)` (back-door, but **blocked by B!**) - Don’t need to control for anything here! ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-36-1.png)<!-- --> ] --- # The Back Door Criterion: Colliders .pull-left[ .smallest[ .content-box-green[ .green[**Example**]: - Are you less likely to get the flu if you are hit by a bus? - `\(Flu\)`: getting the flu - `\(Bus\)`: being hit by a bus - `\(Hos\)`: being in the hospital - Both `\(Flu\)` and `\(Bus\)` send you to `\(Hos\)` (arrows) - Conditional on being in `\(Hos\)`, negative correlation between `\(Flu\)` and `\(Bus\)` (spurious!) ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-37-1.png)<!-- --> ] --- # The Back Door Criterion: Colliders .pull-left[ - In the NBA, apparently players’ height has no relationship to points scored? ] .pull-right[ .center[ ![](../images/bulls-scores-1.png) ] ] --- # The Back Door Criterion: Colliders .pull-left[ - **In the NBA**, players’ height has no relationship to points scored - Naturally, taller people score more points in a basketball game, but if you *only* look at NBA players, that relationship goes away - A person being in the NBA is a collider! Colliders are another way to see .hi-orange[selection bias] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-38-1.png)<!-- --> ] --- # The Front Door Criterion: Mediators I .pull-left[ .smallest[ - Another case where controlling for a variable actually *adds bias* is if that variable is known as a .hi[“mediator”]. .content-box-green[ .green[**Example**]: - `\(X \rightarrow M \rightarrow Y\)` (front-door) - `\(X \leftarrow A \rightarrow Y\)` (back-door) - `\(X \leftarrow B \rightarrow Y\)` (back-door) - Should we control for `\(M\)`? - If we did, this would block the front-door! ] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-39-1.png)<!-- --> ] --- # The Front Door Criterion: Mediators II .pull-left[ .smallest[ - Another case where controlling for a variable actually *adds bias* is if that variable is known as a .hi[“mediator”]. .content-box-green[ .green[**Example**]: - If we control for `\(M\)`, would block the front-door! - If we can estimate `\(X \rightarrow M\)` and `\(M \rightarrow Y\)` (note, no back-doors to either of these!), we can estimate `\(X \rightarrow Y\)` ] - This is the .hi-purple[front door method] ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-40-1.png)<!-- --> ] --- # The Front Door Criterion: Mediators III .pull-left[ .smallest[ - Tobacco industry claimed that `\(cor(smoking, cancer)\)` could be spurious due to a confounding `gene` that affects both! - Smoking `gene` is unobservable - Suppose smoking causes `tar` buildup in lungs, which cause `cancer` - We should *not* control for `tar`, it's on the **front-door path** - This is how scientific studies can relate smoking to cancer ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-41-1.png)<!-- --> ] --- # Summary: DAG Rules for Causal Identification .pull-left[ .smallest[ Thus, to achieve .hi-purple[causal identification], control for the minimal amount of variables such that: 1. Ensure **no back-door path remains open** - Close back-door paths by *controlling* for any one variable along that path - Colliders along a path *automatically* close that path 2. Ensure **no front-door path is closed** - Do not control for mediators ] ] .pull-right[ ![](3.2-slides_files/figure-html/unnamed-chunk-42-1.png)<!-- --> ]