σ StatsDoge Causal inference workflows
11
Workflow·6 steps·branched

Evaluating a causal forest fit

Source grf — Athey, Tibshirani & Wager
Summary by StatsDoge

Did the forest actually capture treatment-effect heterogeneity? Calibration → variable importance → BLP → omnibus tests.

1

Input · what goes in

A fitted causal forest (n=2000, p=10) you want to validate before trusting.

Show data format & exampleHide example

Format — one row per unit. A covariate matrix X (numeric), a binary treatment W ∈ {0,1}, and an outcome Y.

  X1     X2    X3    W    Y
 0.42  -1.1   0     1   3.10
-0.07   0.6   1     0   1.85
 1.20   0.3   0     1   4.02
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Estimation

[GRF] Causal forest

The core estimate — where the causal quantity itself is computed.

What happens here

Fit with OOB predictions and variance.

Formula
au(x)=\mathbb{E}\!\left[\,Y(1)-Y(0)\mid X=x\, ight]
The estimator

Causal forest — Honest random forest for heterogeneous treatment effects — CATE for a binary treatment via GRF moment conditions.

Reads from the input data Feeds into #2#3#4#5
Key code
cf <- causal_forest(X, Y, W)          # Y.hat, W.hat cross-fit
tau.hat <- predict(cf)$predictions    # OOB CATEs
Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

test_calibration()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Mean-forest coefficient ≈ 1 says the average ATE is right; differential-forest coefficient ≈ 1 says heterogeneity is real.

Formula
\hat au(X)=\alpha\,ar au+eta\,(\hat au(X)-ar au)+\varepsilon\quad(\alpha,eta\approx1\ ext{if calibrated})
Reads from #1 Feeds into #6
Key code
test_calibration(cf)   # mean & differential forest coefficients ≈ 1?

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Diagnostic / pre-tests

variable_importance()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Which covariates the forest split on most often — sanity-check against domain knowledge.

Reads from #1 Feeds into #6
Key code
variable_importance(cf)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Heterogeneity

best_linear_projection()

Heterogeneity — who is affected, and by how much, not just on average.

What happens here

Linear projection of τ̂(x) on a hand-picked subset for human-readable heterogeneity.

Formula
\mathbb{E}\!\left[\, au(X)\mid A\, ight]=A^ opeta
Reads from #1 Feeds into #6
Key code
best_linear_projection(cf, X[, c("age", "prior")])

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
5
Diagnostic / pre-tests

OOB residual checks

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Plot OOB CATEs against propensity & key covariates; large structure left here = under-fitting.

Reads from #1 Feeds into #6
Discussion on this step (0)
  • No comments on this step yet — be the first.
6
Reporting

Fit-evaluation report

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Calibration table + importance bar chart + BLP table + residual scan, in one place.

Reads from #2#3#4#5 Feeds into the final output
Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 3 figures

Estimated propensity scores by arm — a quick check on overlap / positivity.
Fig 1Estimated propensity scores by arm — a quick check on overlap / positivity.
Covariate balance before vs after IPW weighting.
Fig 2Covariate balance before vs after IPW weighting.
Distribution of the per-unit bias diagnostics from test calibration.
Fig 3Distribution of the per-unit bias diagnostics from test calibration.

Figures reproduced from grf — Athey, Tibshirani & Wager — unofficial community showcase; all credit to the original authors.

The GRF 'Evaluating a causal forest fit' tutorial. Before reading anything off a fitted forest, check that it's well-calibrated and that the heterogeneity isn't an artifact of noise. Then look at what's driving the splits and how the CATE relates to interpretable covariates. Unofficial summary.

Discussion (0)

  • No comments yet — start the conversation.