σ StatsDoge Causal inference workflows
9
Workflow·4 steps·branched

Design & diagnose a randomized experiment (DeclareDesign)

Source DeclareDesign — Blair, Cooper, Coppock & Humphreys
Summary by StatsDoge

Write the study as a four-tuple — model, inquiry, data strategy, answer strategy — simulate it thousands of times, and read its operating characteristics (bias, power, coverage) before you spend a single observation. The estimand is named, not implied.

1

Input · what goes in

A planned experiment: sample size, treatment assignment, potential-outcome model, and the estimand you care about.

Show data format & exampleHide example

Format — declare units, potential outcomes, assignment, and the inquiry.

library(DeclareDesign)
des <- declare_model(N=200, potential_outcomes(Y ~ 0.2*Z + U)) +
       declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
       declare_assignment(Z = complete_ra(N)) +
       declare_estimator(Y ~ Z, inquiry='ATE')
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Declare the model & potential outcomes

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Define units and the science: each unit's outcomes under treatment and control.

Formula
Y_i = Z_i\,Y_i(1) + (1-Z_i)\,Y_i(0),\qquad au=\mathbb{E}[\,Y_i(1)-Y_i(0)\,]
Reads from the input data Feeds into the final output
Key code
declare_model(N = 200, potential_outcomes(Y ~ 0.2 * Z + U))

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Difference-in-means estimator

The core estimate — where the causal quantity itself is computed.

What happens here

Complete randomization → the simple difference in group means is unbiased for the ATE.

Formula
\hat au=ar Y_1-ar Y_0
Reads from the input data Feeds into the final output
Key code
declare_estimator(Y ~ Z, .method = difference_in_means, inquiry = 'ATE')

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Neyman variance & confidence intervals

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Conservative repeated-sampling variance from the two arms.

Formula
\widehat{\mathrm{Var}}(\hat au)=\dfrac{s_1^2}{n_1}+\dfrac{s_0^2}{n_0}
Reads from the input data Feeds into the final output
Key code
lm_robust(Y ~ Z, data = dat)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Diagnostic / pre-tests

Diagnose: bias, power, coverage

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Simulate the whole design many times to read its operating characteristics.

Formula
ext{power}=\mathbb{P}ig( ext{reject }H_0: au=0 \mid auig)
Reads from the input data Feeds into the final output
Key code
diagnose_design(des)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

⚠️ Unofficial community showcase of declaredesign. Not affiliated with the authors; all credit to them.

Specify a study as model–inquiry–data–answer, simulate it, and read its diagnosands — bias, power, coverage — before you run it.

Discussion (0)

  • No comments yet — start the conversation.