σ StatsDoge Causal inference workflows
11
Workflow·4 steps

Matching for causal inference (MatchIt)

Source MatchIt — Ho, Imai, King & Stuart
Summary by StatsDoge

Preprocess by matching so groups are comparable, check balance, then estimate the effect on the matched sample — design before analysis.

1

Input · what goes in

A binary treatment and the covariates to match on (Lalonde job-training data).

Show data format & exampleHide example

Format — one row per unit: treatment W ∈ {0,1} and covariates X.

  W  age educ   race   re74  re75
  1   37   11  black     0     0
  0   30   12  white  4100  3800
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Treatment W + covariates X

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Binary treatment plus the covariates to match on.

Reads from the input data Feeds into #2
Key code
data("lalonde", package = "MatchIt")
Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

[MatchIt] Matching for causal inference — matchit()

The core estimate — where the causal quantity itself is computed.

What happens here

Nearest-neighbor matching on the estimated propensity score.

Formula
\hat au_{\mathrm{ATT}}= frac1{n_1} extstyle\sum_{i:W_i=1}ig(Y_i-Y_{j(i)}ig)
The estimator

Matching for causal inference — matchit() — Nearest-neighbor, optimal, full, and genetic matching to preprocess data so treated and control groups are comparable before estimating effects.

Reads from #1 Feeds into #3
Key code
m <- matchit(treat ~ age + educ + race + re74 + re75, lalonde, method = "nearest")
Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Diagnostic / pre-tests

Assess balance (summary / plot)

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Check standardized mean differences and eCDF/QQ plots on the matched sample.

Reads from #2 Feeds into #4
Key code
summary(m); plot(m, type = "jitter")

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Reporting

Estimate the effect on matched data

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Fit the outcome model on match.data() with cluster-robust SEs.

Reads from #3 Feeds into the final output
Key code
fit <- lm(re78 ~ treat, data = match.data(m), weights = weights)
Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 3 figures

Balance after nearest-neighbour matching — standardized mean differences per covariate.
Fig 1Balance after nearest-neighbour matching — standardized mean differences per covariate.
Propensity-score overlap before and after matching.
Fig 2Propensity-score overlap before and after matching.
eCDF / distributional comparison of a covariate on the matched sample.
Fig 3eCDF / distributional comparison of a covariate on the matched sample.

Figures reproduced from MatchIt — Ho, Imai, King & Stuart — unofficial community showcase; all credit to the original authors.

The MatchIt vignette (Lalonde). Match, assess balance, and only then estimate. Unofficial summary.

Discussion (2)

  • 3

    Matching as the design stage — outcome-free — is the discipline people skip. MatchIt makes it the path of least resistance.

    2

    And match.data() → any outcome model. Pairs perfectly with cobalt for the balance plots.

  • 6

    Nearest, optimal, full, genetic — all behind one matchit() call. Great teaching tool.