σ StatsDoge Causal inference workflows
10
Workflow·4 steps·branched

Goodman-Bacon decomposition: what your TWFE estimate is averaging (bacondecomp)

Source bacondecomp — Flack; Goodman-Bacon (2021)
Summary by StatsDoge

Your headline TWFE coefficient is a variance-weighted average of every 2×2 DiD inside the panel — including the dangerous 'later vs. earlier treated' comparisons that use already-treated units as controls. This decomposes those weights so you can see when the average is trustworthy.

1

Input · what goes in

A staggered-adoption panel: units adopting treatment at different times, with a binary treatment indicator.

Show data format & exampleHide example

Format — long panel: unit, time, outcome y, treatment treated (0/1, absorbing).

library(bacondecomp)
df_bacon <- bacon(y ~ treated, data = panel,
                  id_var = 'unit', time_var = 'year')
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

A staggered-adoption panel

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Units switch treatment on at different dates. The TWFE coefficient you'd normally report hides a lot of structure.

Formula
y_{it}=\alpha_i+\gamma_t+eta^{TWFE}D_{it}+\varepsilon_{it}
Reads from the input data Feeds into the final output
Key code
# unit, year, y, treated (0/1, turns on and stays on)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

Decompose into 2×2 comparisons

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Every pair of timing groups forms a 2×2 DiD; TWFE is their variance-weighted average.

Formula
\hateta^{TWFE}= extstyle\sum_k s_k\,\hateta_k^{2 imes2},\qquad extstyle\sum_k s_k=1
Reads from the input data Feeds into the final output
Key code
df_bacon <- bacon(y ~ treated, data = panel,
                  id_var = 'unit', time_var = 'year')

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Robustness check

Spot the forbidden comparisons

A robustness check — does the headline result survive a different lens?

What happens here

'Later vs earlier treated' uses already-treated units as controls — under dynamic effects this term is biased and can carry negative weight.

Formula
s_k\ \propto\ n_k\,ar D_k\,(1-ar D_k)\quad( ext{treatment-variance weights})
Reads from the input data Feeds into the final output
Key code
library(ggplot2)
ggplot(df_bacon, aes(weight, estimate, color = type)) + geom_point()

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Reporting

Read β as a weighted average

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

If the dangerous comparisons carry real weight, prefer a modern estimator (Callaway-Sant'Anna, did2s) over plain TWFE.

Formula
\hateta^{TWFE}=\mathbb E_S[\hateta_k]\ \Rightarrow\ ext{trustworthy only if the weights are}
Reads from the input data Feeds into the final output
Key code
weighted.mean(df_bacon$estimate, df_bacon$weight)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Goodman-Bacon decomposition: each 2×2 DiD comparison's estimate vs. its TWFE weight, split by type — the 'later vs earlier treated' triangles are the risky ones.
Fig 1Goodman-Bacon decomposition: each 2×2 DiD comparison's estimate vs. its TWFE weight, split by type — the 'later vs earlier treated' triangles are the risky ones.

Figures reproduced from bacondecomp — Flack; Goodman-Bacon (2021) — unofficial community showcase; all credit to the original authors.

⚠️ Unofficial community showcase of bacondecomp. Not affiliated with the authors; all credit to them.

A two-way fixed-effects DiD is a weighted average of all possible 2×2 comparisons — including 'forbidden' ones that use already-treated units as controls. This shows you the weights.

Discussion (2)

  • 2

    Every staggered-DiD paper should print this plot. Once you see the 'later vs earlier treated' weight, you can't unsee the TWFE problem.

  • 1

    Great diagnostic, but it's a diagnosis not a cure — pair it with Callaway-Sant'Anna or did2s when the bad comparisons carry weight.