Workflow·8 steps·branched

Heterogeneous treatment effects with a causal forest (GRF recipe)

@grf ✓ P D · Jun 2, 2026 · 1.1k views

@misc{grf,
  title        = {grf},
  author       = {Athey and Tibshirani and Wager},
  howpublished = {\url{https://grf-labs.github.io/grf/}},
  note         = {Software / documentation}
}

Summary by StatsDoge

The full GRF HTE playbook: cross-fit nuisances → causal forest → calibration → AIPW ATE → BLP → RATE → policy.

Input · what goes in

A randomized/observational study: covariate matrix X, binary treatment W, outcome Y.

Show data format & exampleHide example

Format — one row per unit. A covariate matrix X (numeric), a binary treatment W ∈ {0,1}, and an outcome Y.

  X1     X2    X3    W    Y
 0.42  -1.1   0     1   3.10
-0.07   0.6   1     0   1.85
 1.20   0.3   0     1   4.02

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Data prep

[GRF] Regression forest

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Cross-fit Y.hat = E[Y|X] and W.hat = E[W|X] so the causal forest splits on orthogonalized residuals.

Formula

\hat\mu(x)=\mathbb{E}\!\left[\,Y\mid X=x\, ight]

The estimator

Regression forest — Honest non-parametric regression for E[Y|X], with out-of-bag predictions and pointwise CIs.

OTHER Code @ v2.6.1 ↗ Paper ↗

Reads from the input data Feeds into #2

Key code

rf <- regression_forest(X, Y)
Y.hat <- predict(rf)$predictions

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

[GRF] Causal forest

The core estimate — where the causal quantity itself is computed.

What happens here

Grow the causal forest on the residualized moments; get OOB CATEs.

Formula

au(x)=\mathbb{E}\!\left[\,Y(1)-Y(0)\mid X=x\, ight]

The estimator

Causal forest — Honest random forest for heterogeneous treatment effects — CATE for a binary treatment via GRF moment conditions.

CATE Code @ v2.6.1 ↗ Paper ↗

Reads from #1 Feeds into #3#4#5#6

Key code

cf <- causal_forest(X, Y, W)          # Y.hat, W.hat cross-fit
tau.hat <- predict(cf)$predictions    # OOB CATEs

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

test_calibration()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Regress AIPW scores on mean-forest and differential-forest predictions; the two coefficients should be ≈ 1.

Formula

\hat au(X)=\alpha\,ar au+eta\,(\hat au(X)-ar au)+\varepsilon\quad(\alpha,eta\approx1\ ext{if calibrated})

Reads from #2 Feeds into #8

Key code

test_calibration(cf)   # mean & differential forest coefficients ≈ 1?

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Inference

[GRF] AIPW average treatment effect

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Doubly-robust ATE (and ATT) as the headline number.

Formula

\hat au= frac1n extstyle\sum_i\hat\Gamma_i,\ \ \hat\Gamma_i=\hat\mu_1(X_i)-\hat\mu_0(X_i)+ frac{W_i(Y_i-\hat\mu_1)}{\hat e(X_i)}- frac{(1-W_i)(Y_i-\hat\mu_0)}{1-\hat e(X_i)}

The estimator

AIPW average treatment effect — Doubly-robust ATE / ATT / ATC / overlap-weighted effect from a trained causal forest, via augmented IPW.

ATE Code @ v2.6.1 ↗ Paper ↗

Reads from #2 Feeds into #8

Key code

average_treatment_effect(cf, target.sample = "all")      # ATE
average_treatment_effect(cf, target.sample = "treated")  # ATT

Discussion on this step (0)

No comments on this step yet — be the first.

Heterogeneity

best_linear_projection()

Heterogeneity — who is affected, and by how much, not just on average.

What happens here

Project the CATE onto a few interpretable covariates to describe who benefits.

Formula

\mathbb{E}\!\left[\, au(X)\mid A\, ight]=A^ opeta

Reads from #2 Feeds into #8

Key code

best_linear_projection(cf, X[, c("age", "prior")])

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Heterogeneity

[GRF] Rank-weighted ATE — RATE / AUTOC / Qini

Heterogeneity — who is affected, and by how much, not just on average.

What happens here

AUTOC on a held-out evaluation forest: is the heterogeneity real and useful for targeting?

Formula

\mathrm{AUTOC}=\int_0^1\!\mathrm{TOC}(q)\,dq,\quad \mathrm{TOC}(q)=\mathbb{E}\!\left[ au(X)\mid \hat S(X)\ge \hat F^{-1}(1-q) ight]-\mathbb{E}[ au(X)]

The estimator

Rank-weighted ATE — RATE / AUTOC / Qini — Evaluate how well a CATE estimate prioritizes treatment: TOC curve, AUTOC and Qini with confidence intervals.

OTHER Code @ v2.6.1 ↗ Paper ↗

Reads from #2 Feeds into #7

Key code

rate <- rank_average_treatment_effect(eval.forest, priorities = tau.hat)
plot(rate)                          # TOC curve
rate$estimate / rate$std.err        # AUTOC z-stat

Discussion on this step (0)

No comments on this step yet — be the first.

Robustness check

Policy learning (policytree)

A robustness check — does the headline result survive a different lens?

What happens here

Learn a depth-2 optimal assignment tree from the doubly-robust scores.

Formula

\hat\pi=\arg\max_{\pi\in\Pi}\ frac1n extstyle\sum_i\Gamma_iig(\pi(X_i)ig)

Reads from #6 Feeds into #8

Key code

library(policytree)
tree <- policy_tree(X, dr.scores, depth = 2)
plot(tree)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Reporting

CATE histogram + targeting report

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Distribution of τ̂(x), BLP table, TOC curve and the learned policy, side by side.

Formula

\hat au_{ ext{target}}= extstyle\sum_i w_i\hat\Gamma_i,\quad w_i\propto frac{p_{ ext{target}}(X_i)}{p_{ ext{source}}(X_i)}

Reads from #3#4#5#7 Feeds into the final output

Key code

average_treatment_effect(cf, target.weights = w_target)

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get 4 figures

Fig 1Out-of-bag CATE estimates τ̂(x) across the sample — the spread is the heterogeneity you're after.

Fig 2How a causal forest splits: it carves out neighbourhoods that maximise the contrast in treatment effect.

Fig 3Targeting operator characteristic (TOC) when units are ranked by the financial-autonomy effect.

Fig 4TOC for the cognitive-impairment outcome — benefit concentrates in the top-ranked units.

Figures reproduced from grf — Athey, Tibshirani & Wager — unofficial community showcase; all credit to the original authors.

What the GRF docs recommend end-to-end for credible heterogeneous-effect analysis. Nuisances are cross-fit first; the causal forest is then validated (calibration), summarized (AIPW ATE, best linear projection), stress-tested for useful heterogeneity (RATE/AUTOC), and finally turned into a targeting policy. Unofficial summary of the public GRF tutorials.

Discussion (3)

6

@calibrator_cleo · Jun 2, 2026

This is the canonical order and I wish more papers followed it. Calibration BEFORE you start interpreting subgroups, every time.

2

@hte_hannah · Jun 2, 2026

And RATE before policy learning — no point optimizing a rule on heterogeneity that isn't there.
3

@targeting_tara · Jun 2, 2026

Bookmarking this as the onboarding doc for new analysts. The fan-out from the forest into ATE / BLP / RATE / policy is exactly how I teach it.
6

@aipw_amir · Jun 2, 2026

The fact that one causal_forest object feeds all four downstream branches is the whole pitch for GRF. Compose, don't re-fit.