σ StatsDoge Causal inference workflows
10
Workflow·4 steps·branched

Draw the DAG, find the adjustment set (ggdag & dagitty)

Source ggdag / dagitty — Barrett; Textor et al.
Summary by StatsDoge

Before estimation, encode the causal story as a DAG, enumerate the open paths from treatment to outcome, and let the graph hand you a minimal set that blocks every backdoor (and tells you not to condition on a collider). Implied conditional independencies are testable.

1

Input · what goes in

Subject-matter assumptions about what causes what: nodes (treatment, outcome, confounders, mediators, colliders) and the directed edges between them.

Show data format & exampleHide example

Format — declare the edges; mark the exposure and outcome.

library(ggdag)
dag <- dagify(y ~ x + a + b,
              x ~ a + b,
              exposure = 'x', outcome = 'y')
ggdag_adjustment_set(dag)   # minimal sets that block the backdoors
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Encode your assumptions as a DAG

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Nodes and directed edges make the causal story explicit and falsifiable — the modelling happens here, not in the regression.

Formula
G=(V,E);\quad au=\mathbb E[Y\mid do(X{=}1)]-\mathbb E[Y\mid do(X{=}0)]
Reads from the input data Feeds into the final output
Key code
dag <- dagify(y ~ x + a + b, x ~ a + b,
              exposure='x', outcome='y')

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

Enumerate paths; spot backdoors & colliders

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Open non-causal paths bias the effect; a collider is the trap — conditioning on it OPENS a path rather than closing it.

Formula
ext{backdoor: } X\leftarrow\cdots o Y;\quad ext{collider: } X o C\leftarrow Y\ ( ext{do not condition})
Reads from the input data Feeds into the final output
Key code
ggdag_paths(dag)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Estimation

Minimal sufficient adjustment set

The core estimate — where the causal quantity itself is computed.

What happens here

The smallest covariate set that blocks every backdoor path (and conditions on no descendant of treatment) identifies the effect.

Formula
Z\ ext{blocks all backdoors}\Rightarrow \mathbb E[Y\mid do(x)]= extstyle\sum_z \mathbb E[Y\mid x,z]\,P(z)
Reads from the input data Feeds into the final output
Key code
ggdag_adjustment_set(dag)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Robustness check

Test the DAG's implications

A robustness check — does the headline result survive a different lens?

What happens here

A DAG implies conditional independencies you can actually check in the data — and warns when several DAGs are observationally equivalent.

Formula
d ext{-sep}(A,B\mid C)\ \Rightarrow\ A\perp\!\!\!\perp B\mid C\ ext{(testable)}
Reads from the input data Feeds into the final output
Key code
library(dagitty); impliedConditionalIndependencies(dag)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 2 figures

The minimal adjustment set {a, b}: conditioning on those nodes blocks the backdoor paths (greyed) and leaves the direct x → y effect identified.
Fig 1The minimal adjustment set {a, b}: conditioning on those nodes blocks the backdoor paths (greyed) and leaves the direct x → y effect identified.
Every open path from treatment x to outcome y: the direct causal path (red) plus the backdoor paths (teal) that confound it until they're blocked.
Fig 2Every open path from treatment x to outcome y: the direct causal path (red) plus the backdoor paths (teal) that confound it until they're blocked.

Figures reproduced from ggdag / dagitty — Barrett; Textor et al. — unofficial community showcase; all credit to the original authors.

⚠️ Unofficial community showcase of ggdag. Not affiliated with the authors; all credit to them.

Before any estimation: encode your assumptions as a causal graph, enumerate the backdoor paths from treatment to outcome, and let the graph hand you the minimal set of covariates to adjust for.

Discussion (2)

  • 2

    Doing this BEFORE you touch the data is the discipline that prevents collider bias. The 'do not condition on a collider' warning can't be repeated enough.

  • 0

    Testable implied conditional independencies are underused — it's the closest thing to a falsification test for your DAG.

    1

    This pairs perfectly with a DoWhy/propensity pipeline: ggdag picks the adjustment set, then you estimate and refute.