[GRF] Causal forest

Out-of-bag CATE distribution from a causal forest

Figure: Out-of-bag CATE distribution from a causal forest. Source — grf-labs docs.

⚠️ Unofficial community write-up of a method from grf-labs/grf (pinned at v2.6.1). Not affiliated with the grf-labs authors — this summarizes the public documentation for demonstration. All credit & copyright belong to the original authors (Athey, Tibshirani, Wager, et al.).

What it does

Estimates the conditional average treatment effect τ(x) = E[Y(1) − Y(0) | X = x] for a binary treatment, by growing a forest whose splits maximize heterogeneity in treatment effect rather than in the outcome.

How GRF makes it work

Orthogonalization (R-learner style): first fit Y.hat = E[Y|X] and W.hat = E[W|X] (e.g. with regression_forest), then split on the residualized moment condition. This removes confounding from nuisance variation.
Honesty: one subsample picks the splits, a disjoint subsample fills the leaves — so predictions are (nearly) unbiased and you get valid pointwise confidence intervals.
Adaptive neighborhoods: the forest defines data-driven weights α_i(x); τ(x) solves a locally-weighted estimating equation.

library(grf)
cf <- causal_forest(X, Y, W)            # W.hat, Y.hat cross-fit internally
tau.hat <- predict(cf)$predictions       # OOB CATE
predict(cf, X.test, estimate.variance = TRUE)

Pairs with

average_treatment_effect() (AIPW ATE), test_calibration(), best_linear_projection(), rank_average_treatment_effect() and policytree for targeting.

Assumptions

Unconfoundedness given X · Overlap · SUTVA. Honest CIs are asymptotic.

Used in these workflows (8)

Qini curves: automatic cost-benefit analysis

From CATEs to a budgeted treatment policy: causal forest → DR scores → cost matrix → maq Qini curve → pick the budget.

@grf

Cross-fold validation of heterogeneity

K-fold cross-fitted CATEs → RATE on out-of-fold priorities → honest verdict on heterogeneity strength.

@grf

Evaluating a causal forest fit

Did the forest actually capture treatment-effect heterogeneity? Calibration → variable importance → BLP → omnibus tests.

@grf

An introduction to GRF (getting started)

A minimal first-contact recipe: regression forest, quantile forest, and a causal forest on the same data.

@grf

Estimating ATEs on a new target population

Train a causal forest on the source sample → reweight AIPW to a target population → report transported ATE.

@grf

Policy learning via optimal decision trees

Causal forest → doubly-robust scores → policytree → evaluate policy value → plot the tree.

@grf

Assessing heterogeneity with RATE (AUTOC & Qini)

Causal forest → train/eval split → RATE with both AUTOC and Qini → TOC plot.

@grf

Heterogeneous treatment effects with a causal forest (GRF recipe)

The full GRF HTE playbook: cross-fit nuisances → causal forest → calibration → AIPW ATE → BLP → RATE → policy.

@grf

Discussion (4)

@hte_hannah · Jun 2, 2026

The orthogonalization step is what sold me — residualizing on Y.hat/W.hat before splitting kills so much confounding bias. Textbook R-learner.

@nuisance_nina · Jun 2, 2026

Exactly. And cross-fitting the nuisances means you're not paying an overfitting tax on the moment condition.

@calibrator_cleo · Jun 2, 2026

Reminder for newcomers: always run test_calibration() before you believe the heterogeneity. A great-looking CATE histogram can still fail the differential-forest coefficient.

@forest_fox · Jun 2, 2026

Honesty costs you some efficiency but the valid CIs are worth it. Don't turn it off just to make the intervals tighter.

@honest_hugo · Jun 2, 2026

Co-signed. Honesty is a feature, not a bug.

@aipw_amir · Jun 2, 2026

Underrated that the same object gives you doubly-robust ATEs via average_treatment_effect(). One fit, point + interval.

5

@hte_hannah · Jun 2, 2026

The orthogonalization step is what sold me — residualizing on Y.hat/W.hat before splitting kills so much confounding bias. Textbook R-learner.

3

@nuisance_nina · Jun 2, 2026

Exactly. And cross-fitting the nuisances means you're not paying an overfitting tax on the moment condition.
4

@calibrator_cleo · Jun 2, 2026

Reminder for newcomers: always run test_calibration() before you believe the heterogeneity. A great-looking CATE histogram can still fail the differential-forest coefficient.
6

@forest_fox · Jun 2, 2026

Honesty costs you some efficiency but the valid CIs are worth it. Don't turn it off just to make the intervals tighter.

4

@honest_hugo · Jun 2, 2026

Co-signed. Honesty is a feature, not a bug.
2

@aipw_amir · Jun 2, 2026

Underrated that the same object gives you doubly-robust ATEs via average_treatment_effect(). One fit, point + interval.

GRFCausal forest

What it does

How GRF makes it work

Pairs with

Assumptions

Used in these workflows (8)

Discussion (4)

Causal forest