Workflow·4 steps

Matching for causal inference (MatchIt)

@matchit ✓ P D · Jun 3, 2026 · 915 views

Source MatchIt — Ho, Imai, King & Stuart

@misc{matchit,
  title        = {MatchIt},
  author       = {Ho and Imai and King and Stuart},
  howpublished = {\url{https://kosukeimai.github.io/MatchIt/}},
  note         = {Software / documentation}
}

Summary by StatsDoge

Preprocess by matching so groups are comparable, check balance, then estimate the effect on the matched sample — design before analysis.

Input · what goes in

A binary treatment and the covariates to match on (Lalonde job-training data).

Show data format & exampleHide example

Format — one row per unit: treatment W ∈ {0,1} and covariates X.

  W  age educ   race   re74  re75
  1   37   11  black     0     0
  0   30   12  white  4100  3800

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Data prep

Treatment W + covariates X

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Binary treatment plus the covariates to match on.

Reads from the input data Feeds into #2

Key code

data("lalonde", package = "MatchIt")

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

[MatchIt] Matching for causal inference — matchit()

The core estimate — where the causal quantity itself is computed.

What happens here

Nearest-neighbor matching on the estimated propensity score.

Formula

\hat au_{\mathrm{ATT}}= frac1{n_1} extstyle\sum_{i:W_i=1}ig(Y_i-Y_{j(i)}ig)

The estimator

Matching for causal inference — matchit() — Nearest-neighbor, optimal, full, and genetic matching to preprocess data so treated and control groups are comparable before estimating effects.

ATT Code @ HEAD ↗ Paper ↗

Reads from #1 Feeds into #3

Key code

m <- matchit(treat ~ age + educ + race + re74 + re75, lalonde, method = "nearest")

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

Assess balance (summary / plot)

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Check standardized mean differences and eCDF/QQ plots on the matched sample.

Reads from #2 Feeds into #4

Key code

summary(m); plot(m, type = "jitter")

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Reporting

Estimate the effect on matched data

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Fit the outcome model on match.data() with cluster-robust SEs.

Reads from #3 Feeds into the final output

Key code

fit <- lm(re78 ~ treat, data = match.data(m), weights = weights)

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get 3 figures

Fig 1Balance after nearest-neighbour matching — standardized mean differences per covariate.

Fig 2Propensity-score overlap before and after matching.

Fig 3eCDF / distributional comparison of a covariate on the matched sample.

Figures reproduced from MatchIt — Ho, Imai, King & Stuart — unofficial community showcase; all credit to the original authors.

The MatchIt vignette (Lalonde). Match, assess balance, and only then estimate. Unofficial summary.

Discussion (2)

3

@calibrator_cleo · Jun 3, 2026

Matching as the design stage — outcome-free — is the discipline people skip. MatchIt makes it the path of least resistance.

2

@aipw_amir · Jun 3, 2026

And match.data() → any outcome model. Pairs perfectly with cobalt for the balance plots.
6

@targeting_tara · Jun 3, 2026

Nearest, optimal, full, genetic — all behind one matchit() call. Great teaching tool.