σ StatsDoge Causal inference workflows
11
Workflow·4 steps·branched

Mendelian randomization: genes as instruments for a causal effect (TwoSampleMR)

Source TwoSampleMR — MRC IEU (Hemani, Davey Smith et al.)
Summary by StatsDoge

Use genetic variants as instruments for an exposure → outcome effect from GWAS summary stats. IVW pools per-SNP Wald ratios by precision; MR-Egger tests directional pleiotropy via the intercept; weighted-median and mode are robust to a chunk of invalid instruments.

1

Input · what goes in

GWAS summary statistics: for each SNP, its effect on the exposure and on the outcome (with standard errors), harmonised to a common effect allele.

Show data format & exampleHide example

Format — per-SNP effects on exposure and outcome, harmonised.

library(TwoSampleMR)
exp <- extract_instruments('ieu-a-2')           # BMI instruments
out <- extract_outcome_data(exp$SNP, 'ieu-a-7')  # coronary heart disease
dat <- harmonise_data(exp, out)
res <- mr(dat)
2

Pipeline · the recipe ⑂ has parallel branches

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Harmonise SNP–exposure & SNP–outcome effects

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Each SNP is a candidate instrument. Align effect alleles across the two GWAS so the signs are comparable.

Formula
Z_j o X,\qquad Z_j\perp\!\!\!\perp U,\qquad Z_j o Y\ ext{only via}\ X
Reads from the input data Feeds into the final output
Key code
dat <- harmonise_data(exp, out)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

Check instrument strength

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Weak instruments bias MR. Read the variance explained and the F-statistic before trusting anything.

Formula
F=\dfrac{R^2\,(N-2)}{1-R^2}\qquad( ext{weak if }F\lesssim 10)
Reads from the input data Feeds into the final output
Key code
# per-SNP F = (beta/se)^2; overall F from R^2

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Estimation

Inverse-variance weighted estimate

The core estimate — where the causal quantity itself is computed.

What happens here

Combine the per-SNP Wald ratios, weighting by precision — the IVW estimate.

Formula
\hateta_{\mathrm{IVW}}=\dfrac{\sum_j \hateta_{Yj}\hateta_{Xj}\,\sigma_{Yj}^{-2}}{\sum_j \hateta_{Xj}^{2}\,\sigma_{Yj}^{-2}},\qquad \hateta_j=\dfrac{\hateta_{Yj}}{\hateta_{Xj}}
Reads from the input data Feeds into the final output
Key code
mr(dat, method_list = c('mr_ivw'))

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Robustness check

Pleiotropy-robust: MR-Egger, weighted median

A robustness check — does the headline result survive a different lens?

What happens here

If instruments affect the outcome through other paths, IVW is biased. The MR-Egger intercept tests directional pleiotropy; weighted median/mode are robust to some invalid instruments.

Formula
\hateta_{Yj}=eta_0+eta_{\mathrm{Egger}}\hateta_{Xj}+\epsilon_j;\quad eta_0 e 0\Rightarrow ext{pleiotropy}
Reads from the input data Feeds into the final output
Key code
mr(dat, method_list = c('mr_egger_regression',
                         'mr_weighted_median'))

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 2 figures

The MR scatter: each point is a genetic instrument (SNP effect on outcome vs. on exposure); the slope of each method line is the estimated causal effect of the exposure.
Fig 1The MR scatter: each point is a genetic instrument (SNP effect on outcome vs. on exposure); the slope of each method line is the estimated causal effect of the exposure.
Single-SNP forest plot: each instrument's Wald-ratio estimate and the pooled effect at the bottom — heterogeneity flags pleiotropy.
Fig 2Single-SNP forest plot: each instrument's Wald-ratio estimate and the pooled effect at the bottom — heterogeneity flags pleiotropy.

Figures reproduced from TwoSampleMR — MRC IEU (Hemani, Davey Smith et al.) — unofficial community showcase; all credit to the original authors.

⚠️ Unofficial community showcase of twosamplemr. Not affiliated with the authors; all credit to them.

Use genetic variants as instruments to estimate the causal effect of an exposure on an outcome from GWAS summary data — with IVW plus pleiotropy-robust MR-Egger and weighted-median checks.

Discussion (2)

  • 2

    Genes as instruments is the cleanest natural experiment we get — but the MR-Egger intercept and weighted-median checks are non-negotiable. Pleiotropy is everywhere.

  • 0

    The scatter with all five estimator slopes on one plot is the right way to show robustness at a glance. Saved.

    1

    Agreed — if IVW and Egger disagree wildly, that's the figure that makes the pleiotropy story obvious to a reviewer.