σ StatsDoge Causal inference workflows
8
Workflow·4 steps·branched

Instrumental variables & 2SLS for an endogenous treatment (ivreg)

Source ivreg — Fox, Kleiber & Zeileis
Summary by StatsDoge

Two-stage least squares treats the instrument as the source of variation: project the endogenous treatment onto Z, then regress Y on the fitted treatment. The estimand under monotonicity is the LATE for compliers — not the ATE — and a first-stage F ≲ 10 means trust nothing.

1

Input · what goes in

Outcome Y, an endogenous treatment D, an instrument Z (relevant + excludable), and covariates X.

Show data format & exampleHide example

Format — one row per unit: y, d (endogenous), z (instrument), covariates x.

  y     d   z    x1
 3.1    1   1   0.4
 1.8    0   0  -0.1
 2.4    0   1   1.2
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Outcome, endogenous treatment, instrument

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Assemble Y, D, Z, X. The instrument must affect D and only affect Y through D.

Formula
au_{\mathrm{LATE}}=\dfrac{\mathrm{Cov}(Y,Z)}{\mathrm{Cov}(D,Z)}
Reads from the input data Feeds into the final output
Key code
library(ivreg)
# y ~ d + x | z + x   (endogenous d, instrument z)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

Check instrument strength (first stage)

A pre-flight check — run this before trusting any estimate downstream.

What happens here

A weak instrument (first-stage F ≲ 10) gives biased, unreliable 2SLS.

Formula
F=\dfrac{\hat\pi^2}{\widehat{\mathrm{Var}}(\hat\pi)}\quad( ext{weak if }F\lesssim 10)
Reads from the input data Feeds into the final output
Key code
summary(lm(d ~ z + x))$fstatistic

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Estimation

Two-stage least squares (ivreg)

The core estimate — where the causal quantity itself is computed.

What happens here

Project D on the instrument, then regress Y on the fitted treatment.

Formula
\hateta_{2SLS}=(\hat D^ op \hat D)^{-1}\hat D^ op Y,\quad \hat D=Z(Z^ op Z)^{-1}Z^ op D
Reads from the input data Feeds into the final output
Key code
fit <- ivreg(y ~ d + x | z + x, data = df)
summary(fit)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Inference

Interpret as a complier effect (LATE)

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Under monotonicity the estimand is the effect for compliers, not everyone.

Formula
au_{\mathrm{LATE}}=\mathbb{E}[\,Y(1)-Y(0)\mid ext{complier}\,]
Reads from the input data Feeds into the final output
Key code
# robust SEs; the estimate is the LATE

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

⚠️ Unofficial community showcase of ivreg. Not affiliated with the authors; all credit to them.

When treatment is endogenous, an instrument identifies the complier (LATE) effect via two-stage least squares — after you check the instrument is strong.

Discussion (0)

  • No comments yet — start the conversation.