Mendelian Randomization - Mendelian randomization analysis in a single line of R code

Details: Written by: Steve Burgess; Published: 15 October 2018

Recently the PhenoScanner webtool has been updated by James Staley and colleagues (in particular Mihir Kamat). A number of factors were improved in the update, including more genetic associations, and the ability to search associations by gene and by risk factor. Another important update is that PhenoScanner can now be called directly from R. Here we present some code demonstrating how to call PhenoScanner from R, and how to integrate output from the PhenoScanner package into the Mendelian randomization package, and perform Mendelian randomization using a single line of R code.

Please note that this code is currently inefficient, as the PhenoScanner code currently queries the whole database, rather than specifically querying just the desired variables. Watch this space for developments! However, this does mean that the code may run slowly, particularly if many people across the world are trying to access the resource simultaneously.

The first step is to install the PhenoScanner package (and the MendelianRandomization package if you haven't done this previously):

install.packages("devtools") library(devtools) install_github("phenoscanner/phenoscanner") library(phenoscanner)

install.packages("MendelianRandomization") library(MendelianRandomization)

The code below takes as inputs a list of rsids, the name of the exposure (risk factor), the PubMed ID of the study that published the association estimates for the exposure, and the ancestry group for the association estimates (eg "Mixed", "European", or "African") - this triple of name/PubMed ID/ancestry is required to uniquely specify a dataset for genetic associations - and the name, PubMed ID and ancestry group for genetic associations with the outcome. It creates an MRInput object, which can then be used as an input for the functions in the MendelianRandomization package.

pheno_input <- function(snps, exposure, pmidE, ancestryE, outcome, pmidO, ancestryO) {

dataTable <- phenoscanner(snpquery = snps, pvalue = 1)$results

snp.list.exposure = unique(dataTable[which(dataTable$trait == exposure & dataTable$pmid == pmidE & dataTable$ancestry == ancestryE & !is.na(dataTable$beta) & !is.na(dataTable$se)),1]) snp.list.outcome = unique(dataTable[which(dataTable$trait == outcome & dataTable$pmid == pmidO & dataTable$ancestry == ancestryO & !is.na(dataTable$beta) & !is.na(dataTable$se)),1]) snp.list = intersect(snp.list.exposure, snp.list.outcome) if (length(snp.list) == 0) { cat("No variants found with beta-coefficients and standard errors for given risk factor and outcome combination. Please check spelling and PMIDs.\n"); return() }

row.exp = NULL; row.out = NULL for (j in 1:length(snp.list)) { row.exp[j] = which(dataTable$trait == exposure & dataTable$pmid == pmidE & dataTable$ancestry == ancestryE & !is.na(dataTable$beta) & !is.na(dataTable$se) & dataTable$snp == snp.list[j])[1] row.out[j] = which(dataTable$trait == outcome & dataTable$pmid == pmidO & dataTable$ancestry == ancestryO & !is.na(dataTable$beta) & !is.na(dataTable$se) & dataTable$snp == snp.list[j])[1] }

Bx. <- dataTable[row.exp, c("snp", "beta", "se")] By. <- dataTable[row.out, c("snp", "beta", "se")]

dataSet <- merge(Bx., By., "snp")

return(mr_input(exposure = exposure, outcome = outcome, snps = as.character(dataSet[,1]), bx=as.numeric(dataSet[,2]), bxse=as.numeric(dataSet[,3]), by=as.numeric(dataSet[,4]), byse=as.numeric(dataSet[,5]), correlation = matrix())) }

pheno_obj = pheno_input(snps=c("rs12916", "rs2479409", "rs217434", "rs1367117", "rs4299376", "rs629301", "rs4420638", "rs6511720"), exposure = "Low density lipoprotein", pmidE = "24097068", ancestryE = "European", outcome = "Coronary artery disease", pmidO = "26343387", ancestryO = "Mixed")

This code can then be combined with any of the functions from the MendelianRandomization package to perform a Mendelian randomization analysis. Here, we use the mr_ivw function to perform a Mendelian randomization analysis in a single line of R code:

mr_obj = mr_ivw(pheno_input(snps=c("rs12916", "rs2479409", "rs217434", "rs1367117", "rs4299376", "rs629301", "rs4420638", "rs6511720"), exposure = "Low density lipoprotein", pmidE = "24097068", ancestryE = "European", outcome = "Coronary artery disease", pmidO = "26343387", ancestryO = "Mixed"))

This analysis shows that LDL-cholesterol is a causal risk factor for coronary artery disease, with an estimate of 0.498 corresponding to an odds ratio of exp(0.498) = 1.65 per 1 unit (here, one standard deviation) increase in LDL-cholesterol.

Comments are welcome!