Workflow·5 steps

Policy learning via optimal decision trees

@grf ✓ P D · Jun 2, 2026 · 686 views

@misc{grf,
  title        = {grf},
  author       = {Athey and Tibshirani and Wager},
  howpublished = {\url{https://grf-labs.github.io/grf/}},
  note         = {Software / documentation}
}

Summary by StatsDoge

Causal forest → doubly-robust scores → policytree → evaluate policy value → plot the tree.

Input · what goes in

Per-unit CATEs / doubly-robust scores from a causal forest, plus a cost.

Show data format & exampleHide example

Format — one row per unit. A covariate matrix X (numeric), a binary treatment W ∈ {0,1}, and an outcome Y.

  X1     X2    X3    W    Y
 0.42  -1.1   0     1   3.10
-0.07   0.6   1     0   1.85
 1.20   0.3   0     1   4.02

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Estimation

[GRF] Causal forest

The core estimate — where the causal quantity itself is computed.

What happens here

Estimate CATEs and the AIPW score for each unit.

Formula

au(x)=\mathbb{E}\!\left[\,Y(1)-Y(0)\mid X=x\, ight]

The estimator

Causal forest — Honest random forest for heterogeneous treatment effects — CATE for a binary treatment via GRF moment conditions.

CATE Code @ v2.6.1 ↗ Paper ↗

Reads from the input data Feeds into #2

Key code

cf <- causal_forest(X, Y, W)          # Y.hat, W.hat cross-fit
tau.hat <- predict(cf)$predictions    # OOB CATEs

Discussion on this step (0)

No comments on this step yet — be the first.

Data prep

double_robust_scores()

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Build the doubly-robust reward matrix policytree maximizes over.

Formula

\Gamma_i=\hat\mu_1(X_i)-\hat\mu_0(X_i)+ frac{W_i(Y_i-\hat\mu_1)}{\hat e(X_i)}- frac{(1-W_i)(Y_i-\hat\mu_0)}{1-\hat e(X_i)}

Reads from #1 Feeds into #3

Key code

dr.scores <- double_robust_scores(cf)   # n × K reward matrix

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

policytree: depth-2 optimal tree

The core estimate — where the causal quantity itself is computed.

What happens here

Learn an interpretable assignment rule that maximizes expected welfare.

Formula

\hat\pi=\arg\max_{\pi\in\Pi}\ frac1n extstyle\sum_i\Gamma_iig(\pi(X_i)ig)

Reads from #2 Feeds into #4#5

Key code

library(policytree)
tree <- policy_tree(X, dr.scores, depth = 2)
plot(tree)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Inference

Evaluate policy value (held-out)

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Estimate the value of the learned rule vs treat-all / treat-none on held-out data.

Formula

V(\pi)=\mathbb{E}\!\left[\,\Gamma(\pi(X))\, ight]

Reads from #3 Feeds into #5

Key code

# value of the learned rule vs treat-all / treat-none on held-out data
mean(dr.scores[cbind(seq_len(n), predict(tree, X))])

Discussion on this step (0)

No comments on this step yet — be the first.

Reporting

Plot the learned decision tree

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Show the tree and the value comparison.

Reads from #3#4 Feeds into the final output

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get

Fig 1The learned treatment rule: a shallow decision boundary splitting treat vs do-not-treat in covariate space.

Figures reproduced from grf — Athey, Tibshirani & Wager — unofficial community showcase; all credit to the original authors.

The GRF + policytree tutorial: turn CATEs into an interpretable, near-optimal treatment rule and honestly evaluate its value. Unofficial summary; policytree is a separate grf-labs package.

Discussion (2)

4

@policy_pia · Jun 2, 2026

Depth-2 trees hit the sweet spot: near-optimal but you can actually explain the rule to ops. The held-out value evaluation keeps you honest.

5

@targeting_tara · Jun 2, 2026

The 'evaluate vs treat-all/treat-none' baseline is what convinces stakeholders. Always include it.
5

@aipw_amir · Jun 2, 2026

Doubly-robust scores as the policytree reward is the key link. Garbage scores → garbage policy.