Non-linear Mendelian randomization (or stratified Mendelian randomization) is an extension to standard Mendelian randomization that performs separate instrumental variable analyses in strata of the population. The motivation is to understand the causal dose—response curve relating the exposure to the outcome – how does the outcome vary on average when we intervene on the exposure at different levels of the exposure?

The initial proposal for this method was to stratify the population on the “residual exposure” – that is, residual values from regression of the exposure on the genetic variants. This can be performed fairly simply in R, but here are some coding shortcuts. We first load the SUMnlmr package, and use this to generate synthetic data:

devtools::install_github("amymariemason/SUMnlmr")
library(SUMnlmr)
set.seed(31415)
test_data<-create_ind_data(N=10000, beta2=2, beta1=1)
outcome = test_data$quadratic.Y
exposure = test_data$X
grs = test_data$g

We calculate the residual exposure by regressing the exposure on the genetic score, subtracting the fitted value from the measured value, and then adding the overall mean (to ensure that the values of the residual exposure are interpretable):

exposure0 = exposure - lm(exposure ~grs)$fit + mean(exposure)

We can divide into a fixed number of equally-sized quantiles:

quant = 10
qs = quantile(exposure0, prob=seq(0, 1-1/quant, by=1/quant))
quantx0 = as.numeric(lapply(exposure0, function(x) { return(sum(x>=qs)) }))

Or into strata at fixed values of the residual exposure (here <0.5, 0.5-1.0, 1.0-1.5, 1.5-2.0, and >2.0):

quantx0 = as.numeric(cut(exposure0, breaks=c(-Inf,0.5,1,1.5,2,Inf), include.lowest=TRUE))

And then calculate genetic associations with the exposure and outcome within each stratum:

bx=NULL; bxse=NULL; by=NULL; byse=NULL; xmean=NULL

for (j in 1:length(unique(quantx0))) {
bx[j] = lm(exposure[quantx0==j] ~ grs[quantx0==j])$coef[2]
bxse[j] = summary(lm(exposure[quantx0==j] ~ grs[quantx0==j]))$coef[2,2]
by[j]= summary(lm(outcome[quantx0==j] ~ grs[quantx0==j]))$coef[2,1]
byse[j] = summary(lm(outcome[quantx0==j] ~ grs[quantx0==j]))$coef[2,2]
xmean[j] = mean(exposure[quantx0==j])
} # add covariates to the regression models to taste

Alternatively, we can use the built-in function create_nlmr_summary, which does this for equally-sized quantiles in one step:

summ_data <-create_nlmr_summary(y = outcome,
                                x = exposure,
                                g = grs,
                                covar = NULL, # add covariates here
                                family = "gaussian",
 # gaussian for continuous outcome, binomial for binary outcome
                                strata_method = "residual",
                                controlsonly = FALSE,
 # only relevant for binary outcomes
                                q = 10) # number of quantiles

Once we have the summarized data, we could calculate IV estimates for each stratum and stop there. Alternatively, we could use one of the fancy plotting functions in the SUMnlmr package to plot the data:

frac_poly_summ_mr(bx=bx,
                  by=by,
                  bxse=bxse,
                  byse=byse,
                  xmean=xmean,
                  family="gaussian",
                  fig=TRUE)

Similarly, there is the piecewise_summ_mr function. There are lots of options in both functions here to explore.

One point to clarify is that the IV estimates represent the change in the outcome per unit difference in the genetically-predicted values of the exposure (or with a causal interpretation, the change in the outcome per unit increase in the exposure). Whereas the plotting functions give the estimated relationship between the exposure and outcome. In these plots, the IV estimate is the gradient (slope) of the curve. So these options represent two ways of expressing the same information.

An assumption made in the “residual method” for non-linear Mendelian randomization is that the genetic effect on the exposure is constant in the population. It turns out that this assumption is often violated. We have recently developed a non-parametric method (the “doubly-ranked method”) for stratifying the population that is less sensitive to violations of this “constant genetic effect” assumption. This can be implemented in the SUMnlmr package using the strata_method = "ranked" option:

summ_data2<-create_nlmr_summary(y = outcome,
                                x = exposure,
                                g = grs,
                                covar = NULL,
                                family = "gaussian",
                                strata_method = "ranked",
                                controlsonly = FALSE,
                                q = 10)

Our current advice is to assess the constant genetic effect assumption using the doubly-ranked method – are the bx estimates from the doubly-ranked method similar? (It turns out that the bx estimates from the residual method are similar whether this assumption is satisfied or not – something we didn’t initially realise.)

If the assumption is violated, then: 1) compare non-linear MR results from both methods (residual and doubly-ranked), with more weight on results from the doubly-ranked method, as this method is less sensitive to violations of the assumption; and 2) consider transforming the exposure – if the bx estimates are increasing, then a log transformation may be appropriate. Transformation will not change the by estimates from the doubly-ranked method, and so transformation will not affect reliability of the doubly-ranked method, but it will change the bx estimates – are these now more similar? If so, results from the residual method following transformation will be more reliable. We note that transformation of the exposure does change the interpretation of the bx estimates and consequently the interpretation of the IV estimates.

> summ_data2
$summary
          bx       by       bxse      byse    xmean     xmin     xmax
1  0.2679257 3.057584 0.01066202 0.1345234 2.569048 2.327217 2.957322
2  0.2554275 3.174083 0.01166755 0.1602636 2.801911 2.437782 3.187628
3  0.2602369 3.532592 0.01243878 0.1748998 2.980195 2.605649 3.382229
4  0.2521341 3.577799 0.01326634 0.1946739 3.149608 2.766214 3.554172
5  0.2471285 3.615373 0.01482321 0.2272206 3.336171 2.932134 3.804060
6  0.2442914 3.703393 0.01775254 0.2868735 3.537327 3.070300 4.031026
7  0.2574438 4.226760 0.02138597 0.3647961 3.798588 3.243232 4.411147
8  0.2354753 4.227439 0.02758202 0.5159806 4.146887 3.467366 4.939591
9  0.2512287 5.066621 0.03813687 0.8212202 4.638575 3.750960 5.641951
10 0.2830375 6.905185 0.05862062 1.5639190 5.606438 4.260455 6.477177

In this case, the bx estimates vary, but not strongly and with no evident pattern – so I would be fairly confident in the reliability of the methods (which in this case both suggest a positive and increasing causal effect – evidenced by the positive and increasing values of by/bx).

Estimates from the doubly-ranked method can be fed into the fractional polynomial or piecewise linear plotting functions similarly to those from the residual method:

with(summ_data2$summary, piecewise_summ_mr(by, bx, byse, bxse, xmean, xmin,xmax,
                  ci="bootstrap_se",
                  nboot=1000,
                  fig=TRUE,
                  family="gaussian",
                  ci_fig="ribbon"))

We hope this tutorial helps introduce you to the doubly-ranked method. More details on the software can be found here: https://github.com/amymariemason/SUMnlmr.