σ StatsDoge Causal inference workflows
10
Workflow·5 steps·branched

Causal forest with time-to-event data (survival)

Source grf — Athey, Tibshirani & Wager
Summary by StatsDoge

Censoring check → causal survival forest → RMST-scale AIPW ATE → calibration → report.

1

Input · what goes in

Right-censored survival time Y, event indicator D, treatment W, covariates X.

Show data format & exampleHide example

Format — one row per unit. Covariates X, binary treatment W, a (possibly censored) time Y, and an event indicator D (1 = event, 0 = censored).

  X1    X2   W     Y     D
 0.4  -1.1  1   12.3   1
-0.1   0.6  0   30.0   0   # censored
 1.2   0.3  1    8.7   1
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Diagnostic / pre-tests

[GRF] Survival forest

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Inspect the conditional censoring/survival curves; pick a horizon h with enough events.

Formula
\hat S(t\mid x)=\mathbb{P}(T>t\mid X=x)
The estimator

Survival forest — Non-parametric conditional survival function S(t | X) under right-censoring.

Reads from the input data Feeds into #2
Key code
sf <- survival_forest(X, Y, D)
predict(sf)$predictions             # conditional survival curve
Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

[GRF] Causal survival forest

The core estimate — where the causal quantity itself is computed.

What happens here

Estimate τ(x) as the difference in restricted mean survival time up to h.

Formula
au(x)=\mathbb{E}\!\left[\,\min(T(1),h)-\min(T(0),h)\mid X=x\, ight]
The estimator

Causal survival forest — CATE with right-censored, time-to-event outcomes (RMST or survival probability).

Reads from #1 Feeds into #3#4
Key code
csf <- causal_survival_forest(X, Y, W, D, horizon = h)
predict(csf)$predictions            # CATE on the RMST scale
Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

[GRF] AIPW average treatment effect

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Aggregate to a doubly-robust RMST-difference ATE.

Formula
\hat au= frac1n extstyle\sum_i\hat\Gamma_i,\ \ \hat\Gamma_i=\hat\mu_1(X_i)-\hat\mu_0(X_i)+ frac{W_i(Y_i-\hat\mu_1)}{\hat e(X_i)}- frac{(1-W_i)(Y_i-\hat\mu_0)}{1-\hat e(X_i)}
The estimator

AIPW average treatment effect — Doubly-robust ATE / ATT / ATC / overlap-weighted effect from a trained causal forest, via augmented IPW.

Reads from #2 Feeds into #5
Key code
average_treatment_effect(cf, target.sample = "all")      # ATE
average_treatment_effect(cf, target.sample = "treated")  # ATT
Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Diagnostic / pre-tests

test_calibration()

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Check the forest is calibrated on the RMST scale.

Formula
\hat au(X)=\alpha\,ar au+eta\,(\hat au(X)-ar au)+\varepsilon\quad(\alpha,eta\approx1\ ext{if calibrated})
Reads from #2 Feeds into #5
Key code
test_calibration(cf)   # mean & differential forest coefficients ≈ 1?

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
5
Reporting

RMST difference by subgroup

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Report the survival benefit overall and for key subgroups, with the censoring caveat.

Reads from #3#4 Feeds into the final output
Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 4 figures

True vs observed event times — censoring shifts the distribution you actually get to see.
Fig 1True vs observed event times — censoring shifts the distribution you actually get to see.
A right-censoring timeline: some units are only observed up to a cutoff, never reaching the event.
Fig 2A right-censoring timeline: some units are only observed up to a cutoff, never reaching the event.
Event-time density truncated at the analysis horizon h, the scale the RMST effect is defined on.
Fig 3Event-time density truncated at the analysis horizon h, the scale the RMST effect is defined on.
TOC with units ranked by their estimated survival benefit.
Fig 4TOC with units ranked by their estimated survival benefit.

Figures reproduced from grf — Athey, Tibshirani & Wager — unofficial community showcase; all credit to the original authors.

The GRF survival tutorial. Model censoring first, estimate CATE on the RMST scale, then aggregate and validate. Unofficial summary.

Discussion (2)

  • 5

    Modelling censoring as step 1 instead of an afterthought is exactly right. So many survival analyses get this backwards.

  • 2

    The horizon-sensitivity caveat in the report step is a nice touch. RMST is horizon-dependent and people forget.