Workflow·6 steps·branched

Evaluating a causal forest fit

@grf ✓ P D · Jun 2, 2026 · 854 views

@misc{grf,
  title        = {grf},
  author       = {Athey and Tibshirani and Wager},
  howpublished = {\url{https://grf-labs.github.io/grf/}},
  note         = {Software / documentation}
}

Summary by StatsDoge

Did the forest actually capture treatment-effect heterogeneity? Calibration → variable importance → BLP → omnibus tests.

Input · what goes in

A fitted causal forest (n=2000, p=10) you want to validate before trusting.

Show data format & exampleHide example

Format — one row per unit. A covariate matrix X (numeric), a binary treatment W ∈ {0,1}, and an outcome Y.

  X1     X2    X3    W    Y
 0.42  -1.1   0     1   3.10
-0.07   0.6   1     0   1.85
 1.20   0.3   0     1   4.02

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Estimation

[GRF] Causal forest

The core estimate — where the causal quantity itself is computed.

What happens here

Fit with OOB predictions and variance.

Formula

au(x)=\mathbb{E}\!\left[\,Y(1)-Y(0)\mid X=x\, ight]

The estimator

Causal forest — Honest random forest for heterogeneous treatment effects — CATE for a binary treatment via GRF moment conditions.

CATE Code @ v2.6.1 ↗ Paper ↗

Reads from the input data Feeds into #2#3#4#5

Key code

cf <- causal_forest(X, Y, W)          # Y.hat, W.hat cross-fit
tau.hat <- predict(cf)$predictions    # OOB CATEs

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

test_calibration()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Mean-forest coefficient ≈ 1 says the average ATE is right; differential-forest coefficient ≈ 1 says heterogeneity is real.

Formula

\hat au(X)=\alpha\,ar au+eta\,(\hat au(X)-ar au)+\varepsilon\quad(\alpha,eta\approx1\ ext{if calibrated})

Reads from #1 Feeds into #6

Key code

test_calibration(cf)   # mean & differential forest coefficients ≈ 1?

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

variable_importance()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Which covariates the forest split on most often — sanity-check against domain knowledge.

Reads from #1 Feeds into #6

Key code

variable_importance(cf)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Heterogeneity

best_linear_projection()

Heterogeneity — who is affected, and by how much, not just on average.

What happens here

Linear projection of τ̂(x) on a hand-picked subset for human-readable heterogeneity.

Formula

\mathbb{E}\!\left[\, au(X)\mid A\, ight]=A^ opeta

Reads from #1 Feeds into #6

Key code

best_linear_projection(cf, X[, c("age", "prior")])

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

OOB residual checks

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Plot OOB CATEs against propensity & key covariates; large structure left here = under-fitting.

Reads from #1 Feeds into #6

Discussion on this step (0)

No comments on this step yet — be the first.

Reporting

Fit-evaluation report

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Calibration table + importance bar chart + BLP table + residual scan, in one place.

Reads from #2#3#4#5 Feeds into the final output

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get 3 figures

Fig 1Estimated propensity scores by arm — a quick check on overlap / positivity.

Fig 2Covariate balance before vs after IPW weighting.

Fig 3Distribution of the per-unit bias diagnostics from test calibration.

Figures reproduced from grf — Athey, Tibshirani & Wager — unofficial community showcase; all credit to the original authors.

The GRF 'Evaluating a causal forest fit' tutorial. Before reading anything off a fitted forest, check that it's well-calibrated and that the heterogeneity isn't an artifact of noise. Then look at what's driving the splits and how the CATE relates to interpretable covariates. Unofficial summary.

Evaluating a causal forest fit

Input · what goes in

Pipeline · the recipe ⑂ has parallel branches

Output · what you get 3 figures

Discussion (0)